2023-11-24

Regex, anonymise all matches, per line where there is 1 mandatory match with various optional matches

I've research this quite heavily now but nothing seems to be getting me close

Below is an excerpt of a csv file. I need to anonymise certain lines where there is a match found for an email address. Once a match is found I need to also anonymise other certain fields that might also be present on the same line.

I read about ? making preceding token's optional so though it would be relatively easy to specific an optional group and a mandatory group but I can't it to work.

This is the example data:

test1,rod.p@nono.com,bbb,123456789,987654321,aaa,121
test2,aaa,rod.p@yes.com,123456789,aaa,bbb,987654321,122,rod.p@yes.com,aaa,123456
test3,rod.p@yesyes.com,123456789,987654321,aaa,123

Based on the below syntax, I need the line test2 being matched only and specifically the parts

aaa [optional as long as the email address has been matched on the same line]
bbb [optional as long as the email address has been matched on the same line]
rod.p@yes.com [mandatory]

(please note the email address may appear more than once)

The below syntax will highlight the right parts but will also select the aaa and bbb on the other rows that don't have the correct email address.

(aaa|bbb)?(rod\.p@yes\.com)?

so I realised that I need to define a start and end like ^ and $ but this is when I'm getting stuck and anything I do doesn't make it work.

^(aaa|bbb)?.*(rod\.p@yes\.com).*$

This matches the whole line of test2 (I guess this is because of the '.*') but I need to only match the individual parts so that I can replace them with the word anonymised. I've tried various things but haven't managed to work it working yet. Any guidance would be much appreciated. Thanks.

PS testing this using regexr.com/ with multiline and global flags enabled.



No comments:

Post a Comment