Another example is compatibility characters, or characters that were introduced into
Unicode for compatibility with older encodings. One area where this occurs is typographical
ligatures (see Figure 8-2).
The text on the left does not use a ligature. For typographical reasons, the style on
the right is usually used for the combination of f and i. The original intent of Unicode
was that a smart rendering system would replace the consecutive code points f
and i with the appropriate ligature. However, many systems turned out not to be
capable of this advanced rendering (Mac OS X being a notable exception). Therefore,
common ligatures were given their own code points, so that they could be
embedded in a body of text and rendered (with a suitable font including those ligatures)
with a dumb client. In this case, the ligature ????¬???? is U+FB01 LATIN SMALL
LIGATURE FI.
To support character composition on platforms with less complex rendering systems,
Unicode includes precomposed characters, such as the ?¶ shown earlier (U+00F6
LATIN SMALL LETTER O WITH DIAERESIS). Compatibility characters such as
the typographical ligatures are often precomposed. In order to properly compare and
collate strings that may include both combining characters and precomposed characters,
the strings must be canonicalized, or reduced to a well-known form such that
two strings that are ???the same??? (by some definition) will always map to the same
sequence of code points.
Pages:
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381