Prev | Current Page 361 | Next

Brad Ediger

"Advanced Rails"

It is always impossible to look at a sequence of
bytes and determine their character encoding; that information must be carried outof-
band. The more potential character sets in use, the worse this problem becomes.
Another problem with the use of ASCII or extended ASCII is that it has no support
for bidirectional, or bidi, text. Some written languages, such as Hebrew and Arabic,
are written primarily right-to-left (RTL). This causes problems in rendering systems
that were designed with left-to-right (LTR) text in mind. Bidirectional text, which
combines LTR and RTL within a page or paragraph, is usually impossible with
ASCII or extended ASCII.
The worst limitation of the extended-ASCII model is that it still only provides support
for a maximum of 256 characters. This is not nearly enough for East Asian languages
(the so-called CJK or CJKV languages, for Chinese, Japanese, Korean, and
Vietnamese), which are ideographic and can require tens of thousands of characters
for adequate coverage. There are several encodings that cover the CJKV languages
specifically, but they do not solve the general problem of having too many encodings.
Unicode
The extended-ASCII model was successful for many years, and the ISO-8859 encodings
provided a good way to support different world scripts. However, the limitations
became increasingly bothersome; multiple languages could not be supported
Unicode | 239
within one document, and the CJKV languages had their own independently developed
character sets and encodings.


Pages:
349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373