I am writing an awk script wherin I want to search for some strings in a file.The problem I am facing is …
The file is extremely huge, in the sense .. Around 1 million lines .
If I search for a string which is present at the last line of the file, then I have unnecessarily traversed the rest of the beginning lines and hence I am looking for some command which when I give the string as an argument, it should give me the line number of the file .. Or I would like to do a binary search on the file, any redirection regarding this .
Just an additional note, the string is not a single string, I have multiple strings to search at one point of time
Regardless of what you do, if the data is in a file, it will have to be read into memory before you can do any processing (no matter how efficient), sorting, searching, etc.
Are you running out of memory, or are you concerned about time? If memory isn’t an issue, 1 million records don’t seem that big these days.
If you just want to determine if a certain string is present in your data file, you could try use
grep. E.g.,will print the line and line number if the target was found in the file. More information on the grep man page.
If you want to locate and then process a line in the file, then
grepwon’t work, and you’ll have to useawk(like you mentioned), or look atsedor writing a custom script in Python or some other language. In all cases, the file will have to be read one way or the other.Perhaps breaking the file into chunks and then processing a specific part (if you can determine ahead of time where to search – though that sound unlikely from your question)