I have a list of rules for a given input file for my function. If any of them are violated in the file given, I want my program to return an error message and quit.
- the first line should start with a ‘#’ symbol (indicating a header line)
- every line should have exactly 10 columns
- column 2 (counting from 0) should be either a + or – symbol
- column 8 should be a comma-separated list of integers
- column 9 should be a comma-separated list of integers, with exactly the same number of integers as column 8.
To attempt to do this I have written the following code:
Edited steveha’s answer into code
with open(infile, 'r') as fp:
line = fp.readline().strip()
if not line.startswith('#'):
print ('First line does not start with #')
sys.exit(1)
n = 1
for line in fp.readlines():
d = '(\d+,\d+)'
n += 1
cols = line.strip().split()
i = search(d, line)
if len(cols) != 10:
print('Lenth not equal to 10')
sys.exit(1)
if cols[2] != '+' or '-':
print('Column 2 is not a + or - symbol')
sys.exit(1)
if i and cols[8] != i.group(1):
print('Column 8 is not a comma-separated list of integers')
sys.exit(1)
if i and cols[9] != i.group(1) and len(cols[9]) != len(cols[8]):
print('Column 9 in not a comma-separated list of integers with the exact same number of integers in column 8')
sys.exit(1)
Yet, when I run this, not all the conditionals appear to be functional. I am trying to do this correctly?
This line is wrong:
This would be correct:
I would suggest this instead:
Also, I suggest you not call
fp.readlines(). The file objectfpworks as an iterator, and when you iterate it you get one line at a time. So your loop would then become:Also, it looks like
nis keeping track of line number? In that case, there is an idiomatic Python way you can do it, like so:enumerate()takes an iterator and returns the next value from the iterator together with an incrementing count. By default the count starts at 0, but you can optionally specify a starting number, as I did here to make it start at 1.And it is best practice in Python to use the
withstatement to open files, so I suggest you do this:The code you are showing does not fully make sense to me. This line:
You must have already done a
from re import searchcommand. I actually recommend just doingimport reand then explicitly callingre.search()but I guess that is a matter of preference. Anyway, this setsito the match group result fromre.search()(or toNoneif the match fails). But later on in the code you are testingrrather thani, and you never setrin any code we see here so I am not sure what that will do. Personally I usemas the variable name for a match group.Your regular expression just matches a pair of positive integers. Nothing there counts how many integers there are.
len(cols[8])is checking how many characters incols[8].You are calling a string method function
.split(''), which is not correct. On my system it raises an exception:ValueError: empty separatorJust call.split()to split on white space; I’ll assume that the comma-separated integers lists must not have any white space.Finally, please consider the guidelines in PEP 8. Your variable
FirstLineis capitalized like a class name rather than a variable name; that didn’t exactly confuse me, but it was sort of distracting. Most of the Python community follows PEP 8.http://www.python.org/dev/peps/pep-0008/
Taking all of the above into account, I simply re-wrote your code:
I wrote a simple function to parse out the list of integers, build a Python list, and return it. Then the code can actually check properly whether the two lists are the same length.