I am trying to create a list based on the input below, and I don’t see the expected output. Can anyone suggest where am I going wrong?
INPUT:
CR FA CL Title
409452 WLAN 656885 Age out RSSI values from buffer in Beacon miss scenario
415560 WLAN 656886 To Record SMD Event Logging
I want an OUTPUT like
[['CR', 'FA', 'CL', 'TITLE'], ['409452', 'WLAN', '656885', 'Age out RSSI values from buffer in Beacon miss scenario'], ['415560', 'WLAN', '656886','To Record SMD Event Logging']]
But i see its getting created like
[['CR', 'FA', 'CL', 'TITLE'], ['', '409452', 'WLAN', '656885\tAge out RSSI values from buffer in Beacon miss scenario'], ['', '415560', 'WLAN', '656886\tTo Record SMD Event Logging']]
Python code
for i in info.splitlines():
index = re.split(r'\W+',i,3)
CRlist.append(index)
The output you’re getting is exactly what you’d expect if there were extra whitespace at the start of each line but the first.
One common reason for this is that you’ve tried parsing files with the wrong line endings, without using universal-newlines mode, and just gotten things hopelessly confused.
For example, these two lines may look identical in your text editor:
But your
re.splitwill do very different things with them:The solution is to strip off the excess whitespace. You can try to write a more complicated regexp, or just do
re.split(r'\W+', s.lstrip(), 3).Since you mentioned wanting to strip trailing whitespace as well, use
stripinstead oflstrip:re.split(r'\W+', s.strip(), 3).But I’m not sure why you’re using regexp in the first place, when you could just do
s.strip().split(None, 3).