Here, we will explore it in more detail.
Recall that the global variable $KCODE determines the current character encoding, and
thus influences how Ruby treats your strings. In Rails 1.2 and later, Initializer sets
$KCODE to 'u', so all processing is assumed to be in UTF-8 unless otherwise specified.
Rails includes a library called ActiveSupport::Multibyte that provides a way to deal
with multibyte characters on top of Ruby. At this time, only UTF-8 is supported. The
encoding is derived from the current value of $KCODE.
Multibyte adds a String#chars instance method, which returns a proxy (of type
ActiveSupport::Multibyte::Chars) to that string. This proxy delegates to a handler,
depending on the current encoding. (Right now, the only handlers are a UTF-8 handler
for $KCODE = 'u' and a pass-through handler for everything else.) The Chars object
uses method_missing to trap unknown calls and send them to the handler. If the
handler cannot deal with them, they are sent to the original String.
The most important feature Multibyte provides is the ability to split strings on character
boundaries, rather than byte boundaries. All you need to do is call the
String#chars method and optionally convert back to a String when you are done:
$KCODE = 'u'
str = "r?©sum?©" # => "r?©sum?©"
str[0..1] # => "r\303"
str.chars[0..1].to_s # => "r?©"
Multibyte also provides case conversion, which can differ vastly among languages:
str.
Pages:
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379