After some careful consideration, I decided to switch web hosts last night. It took a number of hours for the DNS settings to propagate, during which time I played around with the Wikipedia account I created last August but forgot about and got a few hours of troubled sleep.
When I woke up and transferred my WordPress database to the new MySQL server, I was surprised at how simple it was. Then I happened to look at my recent post on 7-Zip, which includes a couple of words of Danish. The heavily accented "Værktøjer" appeared as "V?rkt?jer," suggesting that there was a problem with the character encoding. I immediately assumed that my backup of the database - which I had previously changed to UTF-8 to prevent problems like this - had somehow converted to a more narrow character set, like ISO-8859-1 or (horror upon horrors) ASCII. This initially seemed correct, as the MySQL database containing my WordPress information was set to a ISO-88591-1 collation (which I assumed determines sorting orders in MySQL). After I changed this and re-imported my data, however, there was no change.
Through the magic of Google, I found a thread on the WebmasterWorld forums which suggested I needed to change the default character set of the tables. Since changing the collation had changed the tables' character set automatically, it seemed like it should be working. After some poking around, I realized that "Værktøjer" and other accented words displayed fine in phpMyAdmin, a tool to manage MySQL databases. This suggested to me that something might be wrong with WordPress.
This recalled to me a thread on the WordPress support forums that I had been looking at earlier. In response to another WordPress user's problems with Unicode characters not showing up correctly, "Incubus" wrote:
"If you experience weird problems, like some UTF-8 characters (the Unicode character č and a few others in my case) seemingly being changed to garbage by mysql_query, you may need to do something like this before your actual query:mysql_query("SET NAMES 'utf8'", $conn);
?>"So is this being done by Wordpress? I think not because there are plenty of posts about the same issue. Any developer comments..?
Some testing with a short PHP script I whipped up confirmed that sending the MySQL server a "SET NAMES" query would fix my Unicode problems. I did some research into setting the charset at the server (reading pages from the MySQL manual and looking for websites with relevant information; this post about MySQL and Python was particularly helpful), but was unable to make much headway. I resigned myself to patching wp-db.php, the file in WordPress which takes care of database connections. I would file a bug in WordPress about it, but it seems to be a problem with MySQL ignoring its own character set variables. It was annoying how it wasted at least an hour of my time, but those are the breaks, I guess.
# At 4:17 on August 13, 2005, d.f.h wrote:
thx, that's very helpful to me, cause i've got the same problem.
# At 5:37 on October 26, 2005, Chandu wrote:
Thanks alot for the info... I was facing the same problem.... glad to see someone who has solved it already... I will try it today...
# At 9:43 on December 23, 2005, Le-Fay wrote:
Thanks a lot ! I had te same problem, and it did the trick for me.
# At 1:21 on May 16, 2007, Martey wrote:
This issue is fixed in WordPress 2.2
, assuming you use the wp-config.php that comes with that version, which defines a 'DB_CHARSET' value that wp-db.php uses to run a SET NAMES query.
# At 4:44 on September 14, 2007, Christophe wrote:
I found this post on the subject of UTF8 and version of WordPress below 2.2 to be useful:
http://jonkenpon.com/2007/02/20/making-your-wordpress-database-portable-because-it-probably-isnt-right-now/
# At 1:43 on August 2, 2008, Kathie M. Thomas wrote:
How annoying. All of my sites have recently been shifted from one server to another and I've now got this problem with some of my blogs but not all of them. I don't know that I'm conversant enough to make edits to things but will be passing this info on to my web guy in the hope that he can sort it out for me too.