Prev | Current Page 378 | Next

Brad Ediger

"Advanced Rails"

If you serve HTML in UTF-8, the data you receive through form posts
will be UTF-8. But there are other external sources as well:
Rails and Unicode | 249
??? Forms from third-party sites pointed at your server may not be encoded in UTF-8.
These forms will post their data in the original character set.
??? When interacting with other systems through web services or messaging, a character
set and encoding must be agreed upon.
??? When retrieving data from the Web (with net/http or open-uri), you must be
sure to convert text from its source encoding into your working encoding.
To remedy this situation, you can use the iconv library, which is part of the Ruby standard
library. We have seen this earlier; it was used to strip invalid characters out of our
UTF-8. To convert a string from one encoding to another, create an Iconv object, providing
the source and destination encodings, and call its iconv instance method:
require 'iconv'
# Latin-1 (ISO-8859-1) equivalent of "caf?©"
# Latin-1 E9 == "?©"
cafe_latin1 = "caf#{"E9".hex.chr}"
ic = Iconv.new("utf-8", "iso-8859-1") # to_encoding, from_encoding
cafe_utf8 = ic.iconv(cafe_latin1)
We can play with the $KCODE variable to change how we see the output. If we set
$KCODE to "U", the string is interpreted as UTF-8 and we see the properly converted
???caf?©.??? If $KCODE is "A", the string is interpreted as a series of bytes, and so we see the
unprintable characters escaped:
cafe_latin1 # => "caf\351"
$KCODE = "U"
cafe_utf8 # => "caf?©"
$KCODE = "A"
cafe_utf8 # => "caf\303\251"
As usual, we can see the byte length of each string with String#length:
cafe_latin1.


Pages:
366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390