Prev | Current Page 367 | Next

Brad Ediger

"Advanced Rails"

Here, we will explore it in more detail.
Recall that the global variable $KCODE determines the current character encoding, and
thus influences how Ruby treats your strings. In Rails 1.2 and later, Initializer sets
$KCODE to 'u', so all processing is assumed to be in UTF-8 unless otherwise specified.
Rails includes a library called ActiveSupport::Multibyte that provides a way to deal
with multibyte characters on top of Ruby. At this time, only UTF-8 is supported. The
encoding is derived from the current value of $KCODE.
Multibyte adds a String#chars instance method, which returns a proxy (of type
ActiveSupport::Multibyte::Chars) to that string. This proxy delegates to a handler,
depending on the current encoding. (Right now, the only handlers are a UTF-8 handler
for $KCODE = 'u' and a pass-through handler for everything else.) The Chars object
uses method_missing to trap unknown calls and send them to the handler. If the
handler cannot deal with them, they are sent to the original String.
The most important feature Multibyte provides is the ability to split strings on character
boundaries, rather than byte boundaries. All you need to do is call the
String#chars method and optionally convert back to a String when you are done:
$KCODE = 'u'
str = "r?©sum?©" # => "r?©sum?©"
str[0..1] # => "r\303"
str.chars[0..1].to_s # => "r?©"
Multibyte also provides case conversion, which can differ vastly among languages:
str.


Pages:
355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379