I have a log file which has a format of this kind:
DATE-TIME ### attribute1 ### attribute2 ###attribute3
I have to search this log file for a input attribute(entered from command line) and output the lines that match the entered attribute.
A naive approach may be something like this:
scan the entire file line by line search for the attribute print if found, else ignore.
This approach is slow as it would require O(n) comparisons, where n is number of lines which may be very large.
Another approach may be to use a hash-table but keeping such a in-memory hash-table for a big file may not be possible.
So, what is the best feasible solution? How can I possibly index the entire file on various attributes?
EDIT:
The log file may be about 100K lines, almost like the system log files on linux.
On One invocation, a user may search for multiple attributes, which is not known until the search on 1st attribute is completed like an interactive console.
Thanks,
You can reduce the size of the hash table by only storing hash values and file offsets in it. If the attributes only have a fixed, relatively small number of values, you are more likely to be able to fit the whole hash table in memory. You assign an id to each possible value of the attribute, and then for each id value store a big list of file offsets.
Of course the hash table is only going to be helpful if, within the same run of the program, you do several different searches.
The obvious solution would be to stuff the data in a database, but I assume that the OP is smart enough to have realized that already and has other reasons for specifically requesting a non-database solution to the problem.