upcase # => "R?©SUM?©"
str.chars.upcase.to_s # => "R?‰SUM?‰"
And method calls to chars can be chained, as the Chars methods return a Chars
object rather than Strings. Even methods that are proxied back to the original String
have their String return values converted to Chars objects.
str.chars[0..1].upcase.to_s # => "R?‰"
The implementation of Multibyte is itself fascinating; the tables of composition
maps, codepoints, case maps, and other details are generated automatically from
tables at the Unicode Consortium web site and stored in active_support/values/
unicode_tables.dat. The generator can be found in active_support/multibyte/
generators/generate_tables.rb.
Rails and Unicode | 243
Unicode Normalization
As with any increasingly complicated encoding, normalization and canonicalization
are important issues with Unicode. One representation on paper (or screen) may
map to multiple encodings. In some cases, it may be more desirable to treat those
sequences identically, but in other cases we may need to treat them differently.
One complicating issue is character composition. Unicode provides multiple versions
of some characters, for various reasons. For example, the ?¶ in the German word sch?¶n
can be encoded as either ?¶ (U+00F6 LATIN SMALL LETTER O WITH DIAERESIS)
or as the combination of o (U+006F LATIN SMALL LETTER O) and ?? (U+0308
COMBINING DIAERESIS). The two representations use different byte sequences,
and therefore they would not compare as equivalent to a byte-oriented procedure.
Pages:
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380