I’m using nokogiri with an xml document that looks something like this:
<songs>
<song>
<artist>Juana Molina</artist>
<album>Un Dia</album>
<track>8</track>
<title>Dar (Qué Difícil)</title>
<rating>5</rating>
<filename>\Juana Molina\Un Dia\08 - Juana Molina - Dar (Qué Difícil).mp3</filename>
</song>
<songs>
When I try to find songs that contain escaped characters, I don’t get anything back with the following xpath query (regardless of if I do or don’t escape the filename string being used in the query)
file = File.new("songs.xml")
parser = Nokogiri::XML(file)
filename = "\Juana Molina\Un Dia\08 - Juana Molina - Dar (Qué Difícil).mp3"
xm = Builder::XmlMarkup.new
filename = xm.text! filename
# => \Juana Molina\Un Dia\08 - Juana Molina - Dar (Qué Difícil).mp3
nodes = parser.root.xpath('//songs/song[filename="'+filename+'"]')
puts nodes
What is the correct way to represent escaped characters in the xpath query?
It is difficult for MRI version 1.8.x to use Unicode, but the idea should be to unescape your string first.
I also believe you need to run irb or ruby with -KU in order to force MRI to use unicode.
Hope this helps.