I have a very very simple program that parses a csv file that has rows of text records whose columns are separated by a single tab character.
I understand split() by default splits on whitespace so explicitly specifying a whitespace pattern isn’t needed, but my question is why won’t an explicitly specified pattern for whitespace work? Or is ‘\s’ or r’\s’ not the right pattern/regex? I searched on stackoverflow and found mentioning of string split() being an older method, which I don’t really understand since I am very new to python. Does string split() not support regex?
Here is my code:
#!/usr/bin/env python
import os
import re
import sys
f = open(sys.argv[1])
for line in f:
field = line.split()
field2 = line.split('\s')
print field[1], field2[1]
f.close
I tried doing line.split(r’\s’) and that doesn’t work either, but line.split(‘\t’) works.
Because
\treally represents a tab character in a string (like\nis the new line character, see here a list of valid escape sequences), but\sis a special regular expression character class for white spaces.str.split[docs] does not accept regular expressions. If you want to split with regular expressions, you have to usere.split[docs].Demonstration: