If you serve HTML in UTF-8, the data you receive through form posts
will be UTF-8. But there are other external sources as well:
Rails and Unicode | 249
??? Forms from third-party sites pointed at your server may not be encoded in UTF-8.
These forms will post their data in the original character set.
??? When interacting with other systems through web services or messaging, a character
set and encoding must be agreed upon.
??? When retrieving data from the Web (with net/http or open-uri), you must be
sure to convert text from its source encoding into your working encoding.
To remedy this situation, you can use the iconv library, which is part of the Ruby standard
library. We have seen this earlier; it was used to strip invalid characters out of our
UTF-8. To convert a string from one encoding to another, create an Iconv object, providing
the source and destination encodings, and call its iconv instance method:
require 'iconv'
# Latin-1 (ISO-8859-1) equivalent of "caf?©"
# Latin-1 E9 == "?©"
cafe_latin1 = "caf#{"E9".hex.chr}"
ic = Iconv.new("utf-8", "iso-8859-1") # to_encoding, from_encoding
cafe_utf8 = ic.iconv(cafe_latin1)
We can play with the $KCODE variable to change how we see the output. If we set
$KCODE to "U", the string is interpreted as UTF-8 and we see the properly converted
???caf?©.??? If $KCODE is "A", the string is interpreted as a series of bytes, and so we see the
unprintable characters escaped:
cafe_latin1 # => "caf\351"
$KCODE = "U"
cafe_utf8 # => "caf?©"
$KCODE = "A"
cafe_utf8 # => "caf\303\251"
As usual, we can see the byte length of each string with String#length:
cafe_latin1.
Pages:
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390