Update: Apparently these are control characters, not Unicode characters.
I’m trying to parse an XML file which has an odd character in it that makes it invalid and is causing my tools (Firefox, Nokogiri) to complain.
Here’s what the character looks like in Firefox, and what it looks like when I copy and paste it into Textmate (I’m on OS X obviously).
crazy characters http://img.skitch.com/20090811-ghu43k5u9nhpcjmh443dpq76jp.preview.jpg
Rather than just cryptic icons and little grey diamonds I’d really like to know what these characters are (e.g. hex/dec codes) but I’m not sure how to figure that out.
I would save the page in Firefox to a file, and pass it to
hexdump -C. Look for the fragment of HTML around it in the ASCII part, then look for the hex bytes. Most likely, these are UTF-8, so expect a multi-byte code.