Prev | Current Page 369 | Next

Brad Ediger

"Advanced Rails"


Another example is compatibility characters, or characters that were introduced into
Unicode for compatibility with older encodings. One area where this occurs is typographical
ligatures (see Figure 8-2).
The text on the left does not use a ligature. For typographical reasons, the style on
the right is usually used for the combination of f and i. The original intent of Unicode
was that a smart rendering system would replace the consecutive code points f
and i with the appropriate ligature. However, many systems turned out not to be
capable of this advanced rendering (Mac OS X being a notable exception). Therefore,
common ligatures were given their own code points, so that they could be
embedded in a body of text and rendered (with a suitable font including those ligatures)
with a dumb client. In this case, the ligature ????¬???? is U+FB01 LATIN SMALL
LIGATURE FI.
To support character composition on platforms with less complex rendering systems,
Unicode includes precomposed characters, such as the ?¶ shown earlier (U+00F6
LATIN SMALL LETTER O WITH DIAERESIS). Compatibility characters such as
the typographical ligatures are often precomposed. In order to properly compare and
collate strings that may include both combining characters and precomposed characters,
the strings must be canonicalized, or reduced to a well-known form such that
two strings that are ???the same??? (by some definition) will always map to the same
sequence of code points.


Pages:
357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381