I have problem with one of my validation regex when using nonstandard utf-8 character. So, I run a few experiments and it appears that ruby regex behave different when there are with rails environment or in plain ruby.
I post here my expriment with a Chinese string.
In ruby “pure” :
string = "運動會"
puts string[/\A[\w]*\z/]
=> match "運動會" - ok
In rails :
# coding: utf-8
task :test => :environment do
string = "運動會"
puts string[/\A[\w]*\z/]
end
$ rake test
=> nothing - not ok
If I omit # coding: utf-8, it comes with invalid multibyte char (US-ASCII). Anyway, even with this, it doesn’t match.
Of course, I have checked everything (ruby_version, encoding of script files in utf-8..)
I use :
- Rails 3.0.7
- Ruby 1.9.2 (ruby-1.9.2-p180)
So my conclusion is that rails alter the way regex behave and I did not find a way to make it behaves like in normal ruby.
Ok, I found an answer to my problem. The
\wbehaves only with ascii character in ruby 1.9 against all unicode caracter in ruby 1.8. In ruby 1.9, now we have to use :[\w\P{ASCII}]More infos : http://www.ruby-forum.com/topic/210770