After some careful consideration, I decided to switch web hosts last night. It took a number of hours for the DNS settings to propagate, during which time I played around with the Wikipedia account I created last August but forgot about and got a few hours of troubled sleep.
When I woke up and transferred my WordPress database to the new MySQL server, I was surprised at how simple it was. Then I happened to look at my recent post on 7-Zip, which includes a couple of words of Danish. The heavily accented "Værktøjer" appeared as "V?rkt?jer," suggesting that there was a problem with the character encoding. I immediately assumed that my backup of the database - which I had previously changed to UTF-8 to prevent problems like this - had somehow converted to a more narrow character set, like ISO-8859-1 or (horror upon horrors) ASCII. This initially seemed correct, as the MySQL database containing my WordPress information was set to a ISO-88591-1 collation (which I assumed determines sorting orders in MySQL). After I changed this and re-imported my data, however, there was no change.
Through the magic of Google, I found a thread on the WebmasterWorld forums which suggested I needed to change the default character set of the tables. Since changing the collation had changed the tables' character set automatically, it seemed like it should be working. After some poking around, I realized that "Værktøjer" and other accented words displayed fine in phpMyAdmin, a tool to manage MySQL databases. This suggested to me that something might be wrong with WordPress.
This recalled to me a thread on the WordPress support forums that I had been looking at earlier. In response to another WordPress user's problems with Unicode characters not showing up correctly, "Incubus" wrote:
"If you experience weird problems, like some UTF-8 characters (the Unicode character č and a few others in my case) seemingly being changed to garbage by mysql_query, you may need to do something like this before your actual query:
mysql_query("SET NAMES 'utf8'", $conn);
?>"So is this being done by Wordpress? I think not because there are plenty of posts about the same issue. Any developer comments..?
Some testing with a short PHP script I whipped up confirmed that sending the MySQL server a "SET NAMES" query would fix my Unicode problems. I did some research into setting the charset at the server (reading pages from the MySQL manual and looking for websites with relevant information; this post about MySQL and Python was particularly helpful), but was unable to make much headway. I resigned myself to patching wp-db.php, the file in WordPress which takes care of database connections. I would file a bug in WordPress about it, but it seems to be a problem with MySQL ignoring its own character set variables. It was annoying how it wasted at least an hour of my time, but those are the breaks, I guess.
thx, that's very helpful to me, cause i've got the same problem.
Thanks alot for the info…
I was facing the same problem…. glad to see someone who has solved it already…
I will try it today…
Thanks a lot !
I had te same problem, and it did the trick for me.
[...] For some strange reason, WordPress would not let me upgrade until I disabled Brian's Threaded Comments. Once that was accomplished, I only had to patch wp-db.php again to get Unicode characters to display properly and edit my .htaccess to remove the changes that WordPress had made, which prevented me from accessing my RSS aggregator. [...]
[...] 这几天由于变动服务器,化了不少时间,主要是因为MySQL 4.0对于UTF8编码的处理问题,现在算是好了。 [...]
[...] You may have noticed that it has been over a month since my last post. Oh, yeah, how time flies! My old web server died right before my grandma and mom came to visit me. So not until yesterday was I able to recover here. //sigh Moving to a new server turned out to be a big pain, just like moving to a new place in the real world. All the posts of my other WordPress blog with Chinese characters in UTF-8 encoding got messed up after the import. Considering the fact that the server was already dead, I almost panicked. Luckily, with the help of almighty google search, I found the following site complaining the same issue: Wordpress UTF-8 Charset Woes. I then added the suggested query ("SET NAMES 'utf8'") into the wp-db.php and it worked! [...]
[...] It became quickly apparent I was not alone, as there are many topics regarding this problem and they all had a common "fix", adding something similar to $this->query("SET NAMES 'utf8'"); into Wordpress's main database include. This command tells MySQL to treat its connections as utf8 (or maybe it automatically converts them to utf8) Thinking back to the previous query I ran that showed the new server was handling all matters in utf8 I had little belief this would work, but tried it anyway as it seemed to solve so many other peoples cases. Sure enough this didn't work for me. Not the blindest bit of difference. Since the two databases were handling the connections differently, out of curiosity I tried this command instead: $this->query("SET NAMES 'latin1'"); To my understanding this *shouldn't* work… but it does. Anyway, everything is back to normal although I'm a little more confused than I was before. [...]
This issue is fixed in WordPress 2.2, assuming you use the wp-config.php that comes with that version, which defines a 'DB_CHARSET' value that wp-db.php uses to run a SET NAMES query.
I found this post on the subject of UTF8 and version of WordPress below 2.2 to be useful:
http://jonkenpon.com/2007/02/20/making-your-wordpress-database-portable-because-it-probably-isnt-right-now/
[...] I realized it has got something to do with encoding UTF-8 charset, and found this useful article, Wordpress UTF-8 Charset Woes at MarteyDodoo.com. Will explore further and try solve the problem. Posted by martinz Filed in [...]