2021-12-20

RegEx in VSCode: capture every character/letter - not just ASCII

I am working with historical text and I want to reformat it with RegEx. Problem is: There are lots of special characters (that is: letters) in the text that are not matched by RegEx character classes like [a-z] / [A-Z] or \w . For example I want to match the dot (and only the dot) in the following line:

<tag1>Quomodo restituendus locus Demosth. Olÿnth</tag1>

Without the ÿ I could easily work with the mentioned character classes, like:

(?<=(<tag1>(\w|\s)*))\.(?=((\w|\s)*</tag1>))

But it does not work with special characters that are not covered by ASCII. I tried lots of things but I can't make it work so the RegEx really only captures the dot in this very line. If I use more general Expressions like (.)* (instead of (\w|\s)* ) I get many more of the dots in the document (for example dots that are not between an opening and a closing tag but in between two such tagsets), which is not what I want. Any ideas for an expression that covers like all unicode letters?



from Recent Questions - Stack Overflow https://ift.tt/3miCT4M
https://ift.tt/eA8V8J

No comments:

Post a Comment