Python XML Parse errors with Invalid Token
I am moving code from Python 2.7 to Python 3.10. One section of the code creates XML which is 'prettyfied' and written to a file. But parsing in Python 3.x is throwing an error. In one case the problem seems to be with an encoded en-dash character.
<?xml version='1.0' encoding='utf8'?>
<properties>
<entry key="name">AB&R - RFA #3 \xe2\x80\x93 Alignment</entry>
</properties>
The parsing is done as follows:
xml_parsed = xml.dom.minidom.parseString(xml_string)
return xml_parsed.toprettyxml(" ", "\n")
The error thrown is:
not well-formed (invalid token): line 2
I don't think this problem happened with Python 2.7. There is a nice description about en-dash here (although I think my problem is not limited to en-dash).
What can be done to fix this?
Comments
Post a Comment