I’m writing a script to parse text files (csv to be precise) and I want to pick lines from files based on each line content. There are a number of string conditions to check, so I surmised regexp is the way to go, but I also need to check a number in a beginning of a line against conditions in modulo arithmetics, so far it’s n%4==k and n%2==k. It seems however that there are only ad hoc solutions. n%2==k is pretty straightforward, but to check n%4==2 I had to devise something like this:
r'((^\d*[24680]|^)[26]|^\d*[13579][048])[\s;,].*' # more (unrelated) conditions follow
My questions are:
- Is there a way to simplify the regexp above? Are there any obvious problems with it?
- If I want to generalize the script to other modulo conditions (e.g.
n%3==korn%7==k), is there a feasible way to do it with regexp, or I’d better extract a number from string and write additional code to check such conditions.
This seems to be more accurate for
n%4==2(ref: http://en.wikipedia.org/wiki/Divisibility_rule)For
n%3==0see Regex filter numbers divisible by 3.I’m not aware of any generic solution for
mod n, in any case it would be an interesting but purely theoretical exercise. In real life, just use ints.