The people involved in Han unification (primarily Westerners) tended to
collapse characters that were similar, but not identical, across Asian languages. In the
early days of Unicode, rendering software would get confused and display similar,
but incorrect, glyphs for the Han-unified characters. This was at best disconcerting;
at worst, offensive.
There are technical solutions to all of these problems today, but Unicode was a slow
starter in Japan. Other character sets such as Shift_JIS gained more currency in Japan
at the time, which actually may have contributed somewhat to the problem; having
more extant character sets leads to more conversion issues.*
Multilingualization in Ruby 1.9
Ruby 1.9 will support multilingualization (m17n). Rather than a built-in Unicode
assumption, Ruby 1.9 will support interoperability between multiple character sets.
This is more flexible than assuming that all string literals are Unicode, and it is a
more general approach to character set handling. To use UTF-8 for all string and
regex literals, the following pragma can be used:
# coding: utf-8
* Matz expresses this sentiment in an interview available at http://blog.grayproductions.net/articles/the_ruby_
vm_episode_iv.
242 | Chapter 8: i18n and L10n
ActiveSupport::Multibyte
In lieu of complete multibyte character support in Ruby 1.8, Rails has created a
workaround. We touched on this solution, ActiveSupport::Multibyte, back in
Chapter 2.
Pages:
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378