that &*^%ing UTF-8 MySQL *&^@ing truncate bug again
5 August 2008
MySQL will silently truncate UTF-8 strings at the first "damn foreigner" character or even m-dash or Microsoft Word Apostrophe. If you set it to strict mode via your my.ini file you can cause it to correctly throw the error "data truncated" which is still an incomprehensible error, but at least its not a silent error. I detest silent errors. Otherwise you will just wonder why your data got whacked.
the solution:
=============
Dump your database to a sql file, including drop table commands.
mysqldump --add-drop-table database > db.sql
use this command to convert the encoding:
iconv -f latin1 -t utf-8 < db.sql > db_utf8.sql
from latin1 to utf8, the input file is db.sql, the output file is db_utf8.sql
use a text editor to change all table definitions to: DEFAULT CHARSET=utf8
or do it this way:
cat db2.sql | replace CHARSET=latin1 CHARSET=utf8 > db_utf8.sql
and then reload your database.
mysql database < db2_utf8.sql
Change your database to UTF-8
// sql <br /> alter database cruxxial_md CHARACTER SET = utf8;
(note cruxxial_md is the name of the database)
so that all tables created in the future (in this database) will be UTF-8, all those in the past are still uh ... Swedish Latin Lover or whatever they use up North.
You can set the mysql default charset to utf8 (for all databases created in the future)
by editing:
/etc/my.ini
default-character-set=utf8
character-set-server=utf8
default-collation=utf8_unicode_ci
but my conf file here is not yet correct and the documentation is tedious and the examples mysql gives do not include this most basic of settings via an example. so I still haven't solved that part.
The following is PHP specific:
Make sure your web application has UTF-8 as a meta tag on the page with the FORM that you edit your entries/data with:
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8" />;
this is what caught me.
then your application must perform these queries on each page load, or you can put them into your my.ini file :
doQuery("SET NAMES utf8");<br /> // not sure if this one is needed but ...<br /> doQuery("SET character_set_server=utf8");
some code like this to check from inside your web app that the char vars are all set correctly:
$q =getQuery("SHOW VARIABLES LIKE 'char%'");<br /> while($obj = fetch_object($q)) {<br /> d($obj);<br /> }
more posts in tech notes
- django.db.utils.DatabaseError: relation "django_content_type" does not exist
p.p1 {margin: 0.0px 0.0px 0.0px 0.0px; font: 11.0px Monaco} I was getting an error while running unit tests and the test database was failing to be created.django.db.utils.DatabaseError: relation "django_content_type" does not existeventually found the problem. I had the following in a file that was imported to a models.py:hrd_type = ContentType.objects.get_for_model(HttpReferrerDomain)The idea being that it can just set the var when it loads up and after that its always there. But this means for creating a fresh database, the db is ...
- django has two classes called ValidationError
There is one in django.core.exceptions and one in django.forms.util Using that space age IDE Eclipse I have to say I'm enjoying how much time I've saved just going shift-command-O to organize and resolve all of my imports. But today I've just lost a few hours due to my ok-ing the wrong class. Quite mysterious it was, I raised a ValidationError (core exceptions one) in my form's clean() and watched as the try: except ValidationError: in django's full_clean() completely ignored my ...
- GDAL fails to build: `.rodata' can not be used when making a shared object; recompile with -fPIC
libtool: link: g++ -shared -nostdlib /usr/lib/gcc/x86_64-linux-gnu/4.4.1/../../../../lib/crti.o /usr/lib/gcc/x86_64-linux-gnu/4.4.1/crtbeginS.o .libs/libgdal.la.lnkscript -L/usr/local/lib /usr/local/lib/libgeos_c.so /usr/local/lib/libgeos.so /usr/local/lib/libexpat.so -L/usr/lib -lpq -lrt -ldl /usr/lib/libcurl.so -lssl -lcrypto -lz -L/usr/lib/gcc/x86_64-linux-gnu/4.4.1 -L/usr/lib/gcc/x86_64-linux-gnu/4.4.1/../../../../lib -L/lib/../lib -L/usr/lib/../lib -L/usr/lib/gcc/x86_64-linux-gnu/4.4.1/../../.. -lstdc++ -lm -lc -lgcc_s /usr/lib/gcc/x86_64-linux-gnu/4.4.1/crtendS.o /usr/lib/gcc/x86_64-linux-gnu/4.4.1/../../../../lib/crtn.o -Wl,-soname -Wl,libgdal.so.1 -o .libs/libgdal.so.1.13.2 /usr/bin/ld: /usr/local/lib/libz.a(crc32.o): relocation R_X86_64_32 against `.rodata' can not be used when making a shared object; recompile with -fPIC /usr/local/lib/libz.a: could not read symbols: Bad value collect2: ld returned 1 exit status make[1]: *** [libgdal.la] Error 1 make[1]: Leaving directory `/home/crucial/tmp/gdal-1.6.2' ...
- installing MySQLdb on Ubuntu (mysql-python)
MySQLdb is the python support bindings for MySQL. Not that the name would lead you to beleive that. Its sourceforge page calls it http://sourceforge.net/projects/mysql-python/ which makes more sense. you need setuptools, which you usually already have: sudo aptitude install python-setuptools You need MySQL-devel to compile, but its not called that, its called: libmysql++-dev on Ubuntu sudo apt-get install libmysql++-dev download MySQLdb itself from: http://sourceforge.net/projects/mysql-python/ # the version you download will be more recent tar xfz ...
- Full index for tech notes
- django.db.utils.DatabaseError: relation "django_content_type" does not exist


1 Ali says...
Great post man!
You mentioned that you can do the equivalent of "SET NAMES 'utf8'" in my.ini file. Do you happen to know the name of the directive that actually does that? I've been looking all over and can't find it!
Thanks :)
Posted at 5:43 p.m. on July 2, 20092 crucial says...
yo ali –
it can also be called my.cnf, maybe that's more normal. I chose to do it in my php code. (actually I've switched to Django which has mature support for all languages and charsets)
/etc/mysql/my.cnf: The global mysql configuration file.
This file can be simultaneously placed in three places:
Posted at 6:01 p.m. on July 2, 20091. /etc/mysql/my.cnf to set global options.
2. /var/lib/mysql/my.cnf to set server-specific options.
3. ~/.my.cnf to set user-specific options.