2022-09-28

Regex matching mixed string segments containing operator, string designator, and curly-brace group

I am looking for a C# regex solution to match/capture some small but complex chunks of data. I have thousands of unstructured chunks of data in my database (comes from a third-party data store) that look similar to this:

not BATTCOMPAR{275} and FORKCARRIA{ForkSpreader} and SIDESHIFT{WithSSPassAttachCenterLine} and TILTANGLE{4up_2down} and not AUTOMATSS{true} and not FORKLASGUI{true} and not FORKCAMSYS{true} and OKED{true}

I want to be able to split that up into discrete pieces (regex match/capture) like the following:

not BATTCOMPAR{275} 
and FORKCARRIA{ForkSpreader} 
and SIDESHIFT{WithSSPassAttachCenterLine} 
and TILTANGLE{4up_2down} 
and not AUTOMATSS{true} 
and not FORKLASGUI{true} 
and not FORKCAMSYS{true} 
and OKED{true}

The data will always conform to the following rules:

  • At the end of each chunk of data there will be a string enclosed by curly braces, like this: {275}
  • The "curly brace grouping" will always come at the end of a string beginning with not or and or and not or nothing. The "nothing" is the same as and and will only occur when it's the first chunk in the string. For example, if my and OKED{true} had come at the beginning of the string, the and would have been omitted and OKED{true} would have been prefixed by nothing (empty string). But it's the same as an and.
  • After the operator (and or not or and not or nothing) there will always be a string designator that ends just before the curly brace grouping. Example: BATTCOMPAR
  • It appears that the string designator will always touch the curly brace grouping with no space in between but I'm not 100% sure. The regex should accommodate the scenario in which a space might come between the string designator and the left curly brace.
  • Summary #1 of above points: each chunk will have 3 distinct sub-groups: operator (such as and not), string designator (such as BATTCOMPAR), and curly brace grouping (such as {ForkSpreader}).
  • Summary #2 of above points: each chunk will begin with one of the 3 listed operators, or nothing, and end with a right-curly-brace. It is guaranteed that only 1 left-curly-brace and only 1 right-curly-brace will exist within the entire segment, and they will always be grouped together at the end of the segment. There is no fear of encountering additional/stray curly braces in other parts of the segment.

I have experimented with a few different regex constructions:

Match curly brace groupings:

Regex regex = new Regex(@"{(.*?)}");
return regex.Matches(str);

The above almost works, but gets only the curly brace groupings and not the operator and string designator that goes with it.

Capture chunks based on string prefix, trying to match operator strings:

var capturedWords = new List<string>();
string regex = $@"(?<!\w){prefix}\w+";

foreach ( Match match in Regex.Matches(haystack, regex) ) {
    capturedWords.Add(match.Value);
}

return capturedWords;

The above partially works, but gets only the operators, and not the entire chunk I need: (operator + string designator + curly brace grouping)

Thanks in advance for any help.



No comments:

Post a Comment