While developing a content management system (CMS) using default MySQL settings an issue came to light in regards to UTF-8 encoding. Clients were sending in documents in Microsoft Word format that were encoded with UTF-8. When the data was copied from the document and pasted into the CMS WYSIWYG editor strange characters would be displayed after saving the document.
A second issue was identified where exported MySQL tables that contained UTF-8 encoded characters were being not being read correctly by Debian. Note that I am using an old installation of Debian (fully updated of course) that was rolled out before widespread adoption of UTF-8. Newer installations may already have UTF-8 enabled by default.
The goals of this HOWTO are as follows:
I am using MySQL on Debian and the default MySQL settings use latin1_swedish_ci as the character set for newly created tables. To change the default encoding to UTF-8 open /etc/mysql/my.cnf and add 'default-character-set = utf8' to the '[client]' section. The configuration should read like this:
The basis of converting a table to UTF-8 is:
NOTE: When I ran this script for a migration of a table with a compound index I received an error:
For more information on this issue please refer to the article here.
I wrote a script to automate the migration of larger MySQL databases to use UTF-8. Note that I rely on ~/my.cnf to provide the username and password.
DATABASE=$1
if [ ! "$DATABASE" ]; then
echo "Please specify a database"
exit
fi
BACKUPDIR="/root/tmp/mysql_backups/"
if [ ! -d "$BACKUPDIR" ]; then
mkdir -p "$BACKUPDIR"
fi
BACKUP="$BACKUPDIR""$DATABASE.sql"
mysqldump --add-drop-table --extended-insert "$DATABASE" > "$BACKUP"
TABLES=`mysql --batch --execute 'SHOW TABLES' "$DATABASE" | grep -v "^Tables_in"`
for TABLE in $TABLES; do
echo 'ALTER TABLE `'"$TABLE"'` CONVERT TO CHARACTER SET utf8 COLLATE utf8_general_ci;';
# uncomment the following line to process the commands
#mysql --execute 'ALTER TABLE `'"$TABLE"'` CONVERT TO CHARACTER SET utf8 COLLATE utf8_general_ci;' "$DATABASE"
done
There are a number of ways to set the default locale. The easiest is to reconfigure the locales. Run the command below and select the UTF locales you want. During the reconfigure process you will be able to select your default locale.
Log out and log back in and you will have a default UTF-8 charset. To check run the command:
The output should read 'UTF-8'.
MySQL, PHP, Apache, and UTF-8 Issues
Convert MySQL table to utf8
Step by step introduction to switching your debian installation to utf-8 encoding
How to change your default locale on Ubuntu Linux
Comments
Post new comment