Prev | Current Page 368 | Next

Brad Ediger

"Advanced Rails"

upcase # => "R?©SUM?©"
str.chars.upcase.to_s # => "R?‰SUM?‰"
And method calls to chars can be chained, as the Chars methods return a Chars
object rather than Strings. Even methods that are proxied back to the original String
have their String return values converted to Chars objects.
str.chars[0..1].upcase.to_s # => "R?‰"
The implementation of Multibyte is itself fascinating; the tables of composition
maps, codepoints, case maps, and other details are generated automatically from
tables at the Unicode Consortium web site and stored in active_support/values/
unicode_tables.dat. The generator can be found in active_support/multibyte/
generators/generate_tables.rb.
Rails and Unicode | 243
Unicode Normalization
As with any increasingly complicated encoding, normalization and canonicalization
are important issues with Unicode. One representation on paper (or screen) may
map to multiple encodings. In some cases, it may be more desirable to treat those
sequences identically, but in other cases we may need to treat them differently.
One complicating issue is character composition. Unicode provides multiple versions
of some characters, for various reasons. For example, the ?¶ in the German word sch?¶n
can be encoded as either ?¶ (U+00F6 LATIN SMALL LETTER O WITH DIAERESIS)
or as the combination of o (U+006F LATIN SMALL LETTER O) and ?? (U+0308
COMBINING DIAERESIS). The two representations use different byte sequences,
and therefore they would not compare as equivalent to a byte-oriented procedure.


Pages:
356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380