Rendering non-ASCII characters is crucial when a file text content contains some non-ASCII characters. In French, we often find accentuated characters like “é”, “è” or “à”. But that’s not all of them.
Let’s see how you should go about reading content from the UTF-8 file and render the accent properly in a Jinja template.
Reading the File
Let’s take a JSON file that is encoded with UTF-8.
In Python, you read the file in the following manner:
|
|
While this reads the content, if you have any non-ASCII characters like “é”, “è” or “à”, you should see misencoded characters.
How to Fix the Issue
It’s as simple as adding a parameter the open method:
|
|
With the encoding parameter, Python reads the content with the proper encoding.
Note: If the JSON file has a Byte Order Mark (BOM), you might need to use encoding='utf-8-sig' to handle it correctly. However, if you really need to edit the file, then you should use an editor that can produce valid ’utf-8’ content.
About Jinja Template
Jinja2 templates expect Unicode strings. If the data passed to the template contains byte strings or incorrectly encoded strings, it might not render special characters correctly.
With the fix above, the content rendered is displayed correctly to the user.
About Writing to the File
The json module in Python, by default, sets ensure_ascii=True when serializing data, which means all non-ASCII characters are escaped. This can lead to issues when rendering these characters in HTML or Jinja templates.
The solution: when serializing JSON data, you can set ensure_ascii=False to preserve the original Unicode characters:
|
|
Note: This setting doesn’t affect json.load but is only relevant when you’re writing JSON data back to a file or passing it to a template.
Did You Learn Something
If so…
Follow me
Thanks for reading this article and make sure to share, follow me on X, subscribe to my Substack publication and bookmark my blog to read more in the future.
Photo by cottonbro studio.