Wednesday, October 21, 2009

Character Encoding UTF-8 with JPA/Hibernate, MySql and Tomcat

I'm writing a little Java application using JPA (Hibernate implementation) and Spring. The application will run on Tomcat and uses MySql as the RDBMS.

The problem I had today was with the good old character encoding: I was able to store German Umlaut characters (üöä) properly in MySql, but whenever I retrieved them, they would be scrambled - regardless of whether I displayed the result on a web page or just printed it to the console.

So, the problem is: how to consistently set UTF-8 as the character encoding of choice throughout the whole stack:

  • for MySql as well as for any session coming through the JDBC driver in order to ensure that any entity created by Hibernate/JPA uses UTF-8
  • for Tomcat to make sure that any data served uses UTF-8

I know that you'll find a lot of material on the solution for each individual piece of software in my tech stack across the net. However, I still think it's worth to post this solution as I did not find all elements of it in one place (and don't want to search again the next time :-)

Here's what I did:

1. Ensure that MySql runs on UTF-8 as default: in the MySql configuration file my.cnf add the following in the section for mysqld:

[mysqld]
...
default-character-set=utf8
default-collation=utf8_general_ci
...

2. Configure your MySql JDBC driver connection as follows (obviously hostname, port and schema are probably different in your configuration :-):

jdbc:mysql://localhost:3306/test?useUnicode=true&connectionCollation=utf8_general_ci&characterSetResults=utf8

When configuring the above driver URL in your Spring XML context definition, don't forget to escape the Ampersand as you will get parsing errors otherwise.

3. Configure Tomcat for UTF-8 by adding the following line to your : catalina.bat or catalina.sh:

JAVA_OPTS="$JAVA_OPTS -Djavax.servlet.request.encoding=UTF-8 -Dfile.encoding=UTF-8"

Versions I am using: MySql 5.0, Tomcat 6.0.20, Spring 2.5.6, Java 6, MySql Connector 5.1.6

Happy hacking!

19 comments:

wei said...

But, for me it doesn't work until I set

init_connect='SET collation_connection = utf8_general_ci'
init_connect='SET NAMES utf8'
character-set-server = utf8
collation-server = utf8_general_ci

in the my.ini.

Saeid Zebardast said...

Thank you :)
step 1, solved my problem.

JEUS said...

hi
im use JPA and Glassfish 3 and mysql 5.1
and im insert persian data into mysql but insert question mark
i set UTF-8 on my tables and when insert directly, data is correct
i have a file "sun-resources.xml" http://paste.ideaslabs.com/show/XWBCp5zl0Y
but not work correctly
.
please guide me to change encoding in my app
my eMail: alkhandani@gmail.com

CAMOrOH13 said...

jdbc:mysql://localhost/somedb?useUnicode=true&characterEncoding=UTF-8&characterSetResults=UTF-8

helps you

Alberto said...

Man, you made my day with this post:)
The information about charsets and locale problems is so extremelly spiced out there that you need two weeks to get the relevant stuff out of that. With your post that becomes two minutes.

Thanks for sharing your great article.

Radu said...

Great work. Thank you for the hint. It worked perfectly for me on Kubuntu 11.04 / 64 bits.

Anonymous said...

Thank you, my problem was solved by hibernate connection encoding settings :)

ajit samanta said...

Dear brother i try all points thats you mention.Still have the same problem.when i insert utf-8 data by using plane jdbc-driver it store perfectly but when i use jpa & spring for storing the same data i find the problem & not store in correct format.I use same version that you mention.please help me.Since 2 months i try to solve this problem.
Thanks..

ajit samanta said...
This comment has been removed by the author.
ajit samanta said...
This comment has been removed by the author.
ajit samanta said...
This comment has been removed by the author.
ajit samanta said...
This comment has been removed by the author.
ajit samanta said...
This comment has been removed by the author.
ajit samanta said...
This comment has been removed by the author.
ajit samanta said...
This comment has been removed by the author.
ajit samanta said...
This comment has been removed by the author.
Olga Sidorenco said...

Thanks a million!!! The first suggestion fixed my Russian characters. Nothing else would!! :-)

D.M. said...

You solved ALL my Problems! THX!

jjj said...

yle xar ytleeeeeeeeeee