I have a file (ratings.lst) downloaded from IMDB Interfaces. The content appears to be in in the following format :-
Distribution Votes Rating Title
0000001222 297339 8.4 Reservoir Dogs (1992)
0000001223 64504 8.4 The Third Man (1949)
0000000115 48173 8.4 Jodaeiye Nader az Simin (2011)
0000001232 324564 8.4 The Prestige (2006)
0000001222 301527 8.4 The Green Mile (1999)
My aim is to convert this file into a CSV file (comma separated) with the following desired result (example for 1 line) :
Distribution Votes Rating Title
0000001222, 301527, 8.4, The Green Mile (1999)
I am using textpad and it supports regex based search and replace. I’m not sure what type of regex is needed to achieve the above desired results. Can somebody please help me on this. Thanks in advance.
The other regular expressions are somewhat overcomplicated. Because whitespace is guaranteed not to appear in the first three columns, you don’t have to do a fancy match – “three columns of anything separated by whitepace” will do.
Try replacing
^(.+?)\s+(.+?)\s+(.+?)\s+(.+?)$with\1,\2,\3,"\4"giving the following output (using Notepad++)Note the use of a non-greedy quantifier,
.+?, to prevent accidentally matching more than we should. Also note that I’ve enclosed the fourth column with quote marks""in case a comma appears in the movie title – otherwise the software you use to read the file would interpretAvatar, the Last Airbenderas two columns.The nice tabular alignment is gone – but if you open the file in Excel it will look fine.
Alternately, just do the entire thing in Excel.