I have to work with text that was previously copy/pasted from an excel document into a .txt file. There are a few characters that I assume mean something to excel but that show up as an unrecognised character (i.e. that ‘?’ symbol in gedit, or one of those rectangles in some other text editors.). I wanted to parse those out somehow, but I’m unsure of how to do so. I know regular expressions can be helpful, but there really isn’t a pattern that matches unrecognisable characters. How should I set about doing this?
Share
you could work with http://spreadsheet.rubyforge.org/ maybe to read / parse the data