I have a big log file in which the records are sorted by time. Each line has a time. I need to find all the records between time T1 and time T2 (T1 <= T2). I can scan the whole file line by line and find the start line with T1, copy that into a buffer and then scan the next line until I hit the end time T2. This will work but not very efficient.
I wonder if I can use binary search to locate the lines with time T1 and T2. But I am not sure how to determine the following:
- The middle line of the file
- How to determine the offset we should pass to
lseek()?
Is that possible to use binary search on a file?
Let us assume, that your lines are all reasonable near to the average length (meaning there is no line that will take up half of the log or so), which will make binary search feasible.
Next I will also assume you will have following functions:
With these functions we can implement the following:
I do not guarantee for this to work (on all corner cases) but it sketches the overall implementation. One would use another function
upper_boundand with those you could get start and end of the lines that are within your bounds.