I need help in deciding with search algorithm to use for searching large files.
here is what I am doing. Lets say file consists of time range t1 to t2. (t2>t1)
I need to get file offsets (fseek) of:
- time t3 that is bigger than t1
-
time t4 that is smaller than time t2
| ------| ---|----------------| t1 t3 t4 t2
Naive version is to iterate lines through whole file and return fseek when current time is t3, start from returned seek and iterate while current time is t4, return second fseek
Now lets say file is 100GB and i need to iterate while file just to get period of 2 seconds.
Then this logic becomes too CPU and filesystem expensive. Looking for better solutions. Language in use is C.
Lines are currently fixed size but I’d like to look into future and deal with some algorithm that doesn’t use fixed size lengths.
You can use a binary search if the times in the file are all sorted. Even better if the records in your file are of a fixed width, but you probably can make use of it even if they are not, with some work.