I am currently scraping an rss feed from last.fm and the title attribute looks like it has a unicode “-” that comes up as \u2013 on firebug. Here is the feed for those that are curious:
http://ws.audioscrobbler.com/2.0/user/rj/recenttracks.rss
When I write something like this
feedentry.title.split('-')
it won’t find the unicode dash. I have also tried this:
@feedsplit = feedentry.title.gsub(/\u2013/,'-').split("-")
and some variations like using [] ranges. No luck. I took a look at the other answers floating around, and none of them seem to work for me, so this is my last hope.
Thanks for your time!
The
\u2013syntax only works with Ruby 1.9, which is fully Unicode aware. I’m guessing that you are running Ruby 1.8.In Ruby 1.8, you can still use the unicode dash as argument to
split. These both work:In regular expressions, remember to set the
umodifier for unicode compatibility (outside of Rails):Alternatively, set
$KCODE = "U", which implies theumodifier for all regular expressions. Rails does this for you already.