QTextDocument, HTML, and Unicode: It’s All Greek To Me

To generate reports in my Qt-based software, I create a QTextDocument, which may contain some Unicode characters, convert it to HTML, and then save it to a file.

But the Unicode characters were not being being displayed properly when I opened it in a browser:

Unicode Not Displayed Properly

Unicode Not Displayed Properly

Taking a look at the generated HTML, I noticed that the content encoding was not being set:

So I thought I could simply specify the encoding in the call to toHtml() which would set the encoding in the HTML header.

That looks better! But wait…

Unicode Still Not Displayed Properly

Unicode Still Not Displayed Properly

Uhhh… That looks worse.

After scouring the interwebs, I eventually found the answer. The QTextStream encodes based on the system locale so if you don’t set it explicitly you might not get what you expect. According to the QTextStream docs:

By default, QTextCodec::codecForLocale() is used, and automatic unicode detection is enabled.

For some reason the automatic detection did not work for my case, but the solution is to set the encoding manually using QTextStream::setCodec() like this:

Aha! Now my Greek looked Greek!

Unicode Displayed Properly

Unicode Displayed Properly