2024-01-20

grep - RegEx multiple-criteria select

Given a file containing this string:

IT1*1*EA*VN*ABC@SAC*X*500@REF*ZZ*OK@IT1*1*CS*VN*ABC@SAC*X*500@REF*ZZ*BAR@IT1*1*EA*VN*ABC@SAC*X*500@REF*ZZ*BAR@IT1*1*EA*VN*ABC@SAC*X*500@REF*ZZ*OK@

The goal is to extract the following:

IT1*1*EA*VN*ABC@SAC*X*500@REF*ZZ*BAR@

With the criteria being:

  1. The IT1 "line" must contain *EA*
  2. The REF line must contain BAR

Some notes for consideration:

  • "@" can be thought of as a line break
  • A "group" of lines contains lines starting with IT1 and ending with REF
  • I am running GNU grep 3.7.

The goal is to select the "group" of lines meeting the criteria.

I tried the following:

grep -oP "IT1[^@]*EA[^@]*@.*REF[^@]*BAR[^@]*@" file.txt

But it captures characters from the beginning of the example.

Also tried to use lookarounds:

grep -oP "(?<=IT1[^@]*EA[^@]*@).*?(?=REF[^@]*BAR[^@]*@)" file.txt

But my version of grep returns:

grep: lookbehind assertion is not fixed length



No comments:

Post a Comment