I’m trying to write a regex to parse an sfv file in python.
Basically, the lines are of the format
filename crc_bytes
but whitespaces can be added all over the place, including the file name. so the real format is
(whitespaces)filename(whitespaces)crc_bytes(whitespaces)
when filename can include whitespaces.
Now, I’m trying to extract filename and crc_bytes. So I’ve tried:
'\s*(.+)\s+([^\s]+)'
but it parsed
' filename with spaces crc '
as
'filename with spaces ', 'crc'
//too much spaces————^
Any idea how to get rid of these spaces? probably, look-behind somehow?
bonus question:
Comments in sfv files are lines that start with ‘;’. If anyone would be able to treat comments in the regex I will forever be in his debt.
Thanks!!
Handling filenames with spaces
Using
(.+\S)forces the filename to end with a non-whitespace ('\S) character.Avoiding comments
You could use lookahead or add negation checks to the regex. I think, however, that adding another regex would be more readable:
Now we have three lines, two of which are comment line. The following parses only the lines which are not comments:
Or, in a more verbose fashion: