Prev | Current Page 365 | Next

Brad Ediger

"Advanced Rails"


The Unicode Basic Multilingual Plane (BMP), which contains most of the scripts in
common use today, covers code points U+0000 through U+FFFF. In UTF-8, code
points in the BMPcan be expressed in three or fewer bytes. Though Unicode supports
up to 17 planes of characters (with 65,536 code points each), only about 10%
of the available space has been assigned thus far.
Rails and Unicode | 241
Rails and Unicode
Ruby 1.8 has less-than-ideal Unicode support, when compared to its contemporaries
such as Java and the .NET languages. To Ruby, strings are just sequences of 8-bit
bytes, while the character and string types of the Java runtime and .NET CLR are
based on Unicode code points. While Ruby??™s approach simplifies the language, most
developers at this point in time need Unicode support. Luckily, Ruby is flexible
enough that we can tack support for Unicode onto the language in a relatively
friendly way.
It is not surprising that Ruby??™s Unicode support is lacking. During the time of Ruby??™s
genesis in Japan (the mid-1990s), Unicode was first being developed. In Unicode??™s
early stages, its supporters were mainly American and European, with less East Asian
involvement.
Many Japanese people opposed the process of Han unification, or collapsing most of
the Han characters common to CJKV languages into a single set of code points. The
unified Han characters tended to appeal more to Chinese speakers than Japanese
speakers.


Pages:
353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377