Clojure, Repl and UTF-8

March 15, 2009

Rich Hickey gave a fantastic presentation at QCon on “Persistent Data Structures” and Clojure - his lisp inspired, functional programming language for the Java Virtual Machine. Inspired by his talk, I have been playing around with Clojure all day. There’s a great plugin for Vim out there called VimClojure and after a few failed attempts I got a nice Repl up and running inside Vim (I might post something on how to get this set up at a later point, since the documentation is fairly sparse). So far so good, but when I started playing around with the Repl I got a bit of a shock.

Evaluating (count “españa”) returned 7! Doing (.toUpperCase “españa”) gave me “ESPAñA”! I thought unicode was (pretty much) a solved problem in the java world, so these weird result with non asci characters was quite a surprise.

Turns out the problems are not strictly Clojures. Digging true some mailinglist posts I found out what was going wrong: running the command (java.nio.charset.Charset/defaultCharset) showed me that the default charset for JVMs on my Mac was set to MacRoman - not UTF8.

There seem to be some discussion on whether Clojure should default to UTF8 for the Repl or use the default setting from the host operating system, but the easy fix for now is just to change the default for JVMs, which seems like the sane thing to do anyway.

export JAVA_TOOL_OPTIONS="-Dfile.encoding=UTF8"

This solved the problem for me and now my Vim Repl finally knows how to count the characters of Spain in spanish.


Posted by Mathias Biilmann. Category: Clojure. Tags: clojure.

Comments

Leave a comment