After some careful consideration, I decided to switch web hosts last night. It took a number of hours for the DNS settings to propagate, during which time I played around with the Wikipedia account I created last August but forgot about and got a few hours of troubled sleep.

When I woke up and transferred my WordPress database to the new MySQL server, I was surprised at how simple it was. Then I happened to look at my recent post on 7-Zip, which includes a couple of words of Danish. The heavily accented "Værktøjer" appeared as "V?rkt?jer," suggesting that there was a problem with the character encoding. I immediately assumed that my backup of the database - which I had previously changed to UTF-8 to prevent problems like this - had somehow converted to a more narrow character set, like ISO-8859-1 or (horror upon horrors) ASCII. This initially seemed correct, as the MySQL database containing my WordPress information was set to a ISO-88591-1 collation (which I assumed determines sorting orders in MySQL). After I changed this and re-imported my data, however, there was no change.

Through the magic of Google, I found a thread on the WebmasterWorld forums which suggested I needed to change the default character set of the tables. Since changing the collation had changed the tables' character set automatically, it seemed like it should be working. After some poking around, I realized that "Værktøjer" and other accented words displayed fine in phpMyAdmin, a tool to manage MySQL databases. This suggested to me that something might be wrong with WordPress.

This recalled to me a thread on the WordPress support forums that I had been looking at earlier. In response to another WordPress user's problems with Unicode characters not showing up correctly, "Incubus" wrote:

"If you experience weird problems, like some UTF-8 characters (the Unicode character č and a few others in my case) seemingly being changed to garbage by mysql_query, you may need to do something like this before your actual query:

mysql_query("SET NAMES 'utf8'", $conn);
?>"

So is this being done by Wordpress? I think not because there are plenty of posts about the same issue. Any developer comments..?


Some testing with a short PHP script I whipped up confirmed that sending the MySQL server a "SET NAMES" query would fix my Unicode problems. I did some research into setting the charset at the server (reading pages from the MySQL manual and looking for websites with relevant information; this post about MySQL and Python was particularly helpful), but was unable to make much headway. I resigned myself to patching wp-db.php, the file in WordPress which takes care of database connections. I would file a bug in WordPress about it, but it seems to be a problem with MySQL ignoring its own character set variables. It was annoying how it wasted at least an hour of my time, but those are the breaks, I guess.