Theprogrammersfirst: Python: Finding unicode characters within string using regex

2021-12-01

Python: Finding unicode characters within string using regex

I am attempting to filter out unicode characters from a json (converted to a string) using regex in python, but can't seem to write the re.compile() method correctly as it is throwing many errors.

Here is the code:

    regex = re.compile("\u....")
    string = json.dumps(json)
    matches = re.findall(regex, string)
    print(matches)

This is producing this error:

SyntaxError: (unicode error) 'unicodeescape' codec can't decode bytes in position 0-1: truncated \uXXXX escape

I have tried re-writing it as (r"\u...."), ("\u....") and (r"\u....") and none of these have been successful and given me the error:

re.error: incomplete escape \u at position 0

What is the correct way to get a regex of unicode characters to search a string? Thank you.

from Recent Questions - Stack Overflow https://ift.tt/3G0TilU
https://ift.tt/eA8V8J

Theprogrammersfirst

2021-12-01

Python: Finding unicode characters within string using regex

No comments:

Post a Comment