Have:
f = open(...)
r = re.compile(...)
Need:
Find the position (start and end) of a first matching regexp in a big file?
(starting from current_pos=...)
How can I do this?
I want to have this function:
def find_first_regex_in_file(f, regexp, start_pos=0):
f.seek(start_pos)
.... (searching f for regexp starting from start_pos) HOW?
return [match_start, match_end]
File ‘f’ is expected to be big.
One way to search through big files is to use the
mmaplibrary to map the file into a big memory chunk. Then you can search through it without having to explicitly read it.For example, something like:
This works well for very big files (I’ve done it for a file 30+ GB in size, but you’ll need a 64-bit OS if your file is more than a GB or two).