MySQL, PHP, Apache, and UTF-8 Issues

UTF-8 is hell. I have run into problems with UTF-8 before (see Migrating MySQL to UTF-8 encoding).

I moved a site that was displaying fine from an old Apache server to a new Apache2 server and quickly identified that the Apache2 server was displaying odd characters.

The site is running a PHP application with a MySQL backend. The MySQL database is using latin1_swedish_ci character encoding. The old site was displaying correctly. Both the new and old site are using the same database.

So what was going on?

Data is sent from the client in Word format. The data is copied from the Word document and pasted into a tinyMCE editor. Certain characters in Word, such as the open and close double quotes, are formatted in (I think) Unicode. These characters were saved without issue in the MySQL database.

After a few hours of bloodying my forehead on the brick wall known as UTF-8 I finally determined the sources of the different encodings.

To fix the issue on the Apache2 server I did the following:

I set the "AddDefaultCharset Off" on the site in question on the Apache2 server. The problem was fixed.

I also set the MySQL client default encoding to use UTF-8 however this did not fix the problem (but it is a good idea anyway):

[client]
port = 3306
socket = /var/run/mysqld/mysqld.sock
default-character-set = utf8

Comments

Post new comment

The content of this field is kept private and will not be shown publicly.
By submitting this form, you accept the Mollom privacy policy.