Debian

Migrating MySQL to UTF-8 encoding

While developing a content management system (CMS) using default MySQL settings an issue came to light in regards to UTF-8 encoding. Clients were sending in documents in Microsoft Word format that were encoded with UTF-8. When the data was copied from the document and pasted into the CMS WYSIWYG editor strange characters would be displayed after saving the document.

A second issue was identified where exported MySQL tables that contained UTF-8 encoded characters were being not being read correctly by Debian. Note that I am using an old installation of Debian (fully updated of course) that was rolled out before widespread adoption of UTF-8. Newer installations may already have UTF-8 enabled by default.

The goals of this HOWTO are as follows:

  1. set newly created tables in MySQL to use UTF-8 encoding
  2. convert existing MySQL tables to UTF-8
  3. set default environment (command line) encoding to use UTF-8
Syndicate content