The Unicode Basic Multilingual Plane (BMP), which contains most of the scripts in
common use today, covers code points U+0000 through U+FFFF. In UTF-8, code
points in the BMPcan be expressed in three or fewer bytes. Though Unicode supports
up to 17 planes of characters (with 65,536 code points each), only about 10%
of the available space has been assigned thus far.
Rails and Unicode | 241
Rails and Unicode
Ruby 1.8 has less-than-ideal Unicode support, when compared to its contemporaries
such as Java and the .NET languages. To Ruby, strings are just sequences of 8-bit
bytes, while the character and string types of the Java runtime and .NET CLR are
based on Unicode code points. While Ruby??™s approach simplifies the language, most
developers at this point in time need Unicode support. Luckily, Ruby is flexible
enough that we can tack support for Unicode onto the language in a relatively
friendly way.
It is not surprising that Ruby??™s Unicode support is lacking. During the time of Ruby??™s
genesis in Japan (the mid-1990s), Unicode was first being developed. In Unicode??™s
early stages, its supporters were mainly American and European, with less East Asian
involvement.
Many Japanese people opposed the process of Han unification, or collapsing most of
the Han characters common to CJKV languages into a single set of code points. The
unified Han characters tended to appeal more to Chinese speakers than Japanese
speakers.
Pages:
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377