How to write Chinese characters to file based on unicode code point in Python3

By Ritesh Sahu - September 27, 2020

I am trying to write Chinese characters to a CSV file based on their Unicode code points found in a text file in unicode.org/Public/zipped/13.0.0/Unihan.zip. For instance, one example character is U+9109.

In the example below I can get the correct output by hard coding the value (line 8), but keep getting it wrong with every permutation I've tried at generating the bytes from the code point (lines 14-16).

I'm running this in Python 3.8.3 on a Debian-based Linux distro.

Minimal working (broken) example:

  1 #!/usr/bin/env python3
  2 
  3 def main():
  4 
  5     output = open("test.csv", "wb")
  6 
  7     # Hardcoded values work just fine
  8     output.write('\u9109'.encode("utf-8"))
  9 
 10     # Comma separation
 11     output.write(','.encode("utf-8"))
 12 
 13     # Problem is here
 14     codepoint = '9109'
 15     u_str = '\\' + 'u' + codepoint
 16     output.write(u_str.encode("utf-8"))
 17 
 18     # End with newline
 19     output.write('\n'.encode("utf-8"))
 20 
 21     output.close()
 22 
 23 if __name__ == "__main__":
 24     main()

Executing and viewing results:

example $
example $./test.py 
example $
example $cat test.csv 
鄉,\u9109
example $

The expected output would look like this (Chinese character occurring on both sides of the comma):

example $
example $./test.py 
example $cat test.csv 
鄉,鄉
example $

from Recent Questions - Stack Overflow https://ift.tt/30e5gpH
https://ift.tt/eA8V8J

Search This Blog

Theprogrammersfirst | A technical portal.

How to write Chinese characters to file based on unicode code point in Python3

Comments

Post a Comment

Popular posts from this blog

Spring Elasticsearch Operations

Today Walkin 14th-Sept

Object oriented programming concepts (OOPs)