RSS
19 Oct 2006

MySQL Unicode / UTF-8 Support Can Be Dangerous!!

Author: ceefour | Filed under: Beginner, Complaints, Web 2.0

Quick story:

  1. I already had tons of databases and tables in MySQL on Windows.
  2. I upgraded to MySQL 5.
  3. When asked to choose the default character set, I picked “UTF-8” (best support for multilingualism).
  4. All works right.
  5. I wanted to switch to Ubuntu Linux.
  6. I dumped all my databases:
    mysqldump —all-databases —opt -uroot | bzip2 > mysql.bz2
  7. Formatted my Windows partition then installed Ubuntu Linux 6.06.1 LTS (Dapper Drake).
  8. Installed MySQL 5.0:
    sudo apt-get install mysql-5.0
    (or something like this)
  9. Tried to load my dump:
    bzip2 -c mysql.bz2 | mysql
  10. Guess what comes up:
     MySQL Unicode / UTF 8 Support Can Be Dangerous!!

It turns out that my tables were originally in single-byte character set (Latin1 I suppose), but MySQL presumed that they were UTF-8. This causes mysqldump to generate the incorrect database schema, and then causes errors when loading the dump.

Fortunately, I had another backup: The entire MySQL data directory on the Windows partition. However, using:
mysqld —datadir=/my/windows/partition/mysql_data
doesn’t work.

So I gave up, installed MySQL 5.0 on my Windows inside VMware, using the existing data directory. Works fine, but this time I chose the default charset instead of UTF-8. I redid the mysqldump, then went back to Ubuntu to reload the dump. All fine without errors. After that I uninstalled MySQL from Windows.

I hope you won’t do the same mistake as I did. Don’t use UTF-8 as default character set in MySQL, at least not before this issue is solved completely by MySQL developers. Had I haven’t had another backup, things might get very bad for me (and for you too if you’re the unfortunate one).

Note that in any configuration, you can always specify the character set explicitly when creating the table (just as you can specify whether you want a MyISAM or InnoDB table). This is definitely the recommended way of doing things.

No related posts.

Related posts brought to you by Yet Another Related Posts Plugin.

  • neo

    But, before your mysql upgrade to 5.0 in Step 2, if you had the data in UTF-8, then
    you shouldn’t had this problem, correct ?,

  • http://hendy.gauldong.net/ Hendy Irawan

    Yes, I suppose that is correct.

    The problem is while upgrading, MySQL 5.0 doesn’t seem to “label” the old tables correctly. When I say “UTF-8 as default”, I’m actually expecting “UTF-8 as default for NEW tables”. Existing/old tables should use whatever encoding was used at the time of creation (if it was created with SQL_ASCII encoding as default, I suppose it should be SQL_ASCII from then on).

    This doesn’t seem to be the case in my experience.