I have a text file with about 100,000 lines (5 MB), which is updated once a day. It grows at a rate of about 30 lines a day. The lines are not sorted in any way. Each line is 50 hex characters long and looks like this:
ABCDE9DAF1F66C10C02F25A1685821F8428422F5870F39A3FE
Given one of these strings, I need to figure out if it exists in this file. I am working with C# (.NET CF 2.0) on a handheld device, so memory is limited. I have the ability to process the file before hand on a Windows server. What is the fastest way to do this? Some of my initial ideas include: sorting the file, line by line string compare, creating a binary file to search, or using SQLite.
From OP’s comments (an important one, which was left out from the question initially):
The file is read-only. No changes will
ever be made by my programs. I get a
new version of the file each day with
more strings appended to the end
Optimal way to do this would probably be to pre-sort the file on the server, and use memory mapped files to do a binary search of the file. That being said, .NET CF 2.0 won’t have support for memory mapped files.
You’re probably best off just pre-sorting the file, and using stream access to perform a binary search on the file. It’s not great because you don’t have sequential reads, but seeing as you’re on CF, there is a good chance your data store on the device is flash based, so the random access for the binary search probably won’t be too bad…