I have a list of strings containing about 7 million items in a text file of size 152MB. I was wondering what could be best way to implement the a function that takes a single string and returns whether it is in that list of strings.
Share
Sign Up to our social questions and Answers Engine to ask questions, answer people’s questions, and connect with other people.
Login to our social questions & Answers Engine to ask questions answer people’s questions & connect with other people.
Lost your password? Please enter your email address. You will receive a link and will create a new password via email.
Please briefly explain why you feel this question should be reported.
Please briefly explain why you feel this answer should be reported.
Please briefly explain why you feel this user should be reported.
Are you going to have to match against this text file several times? If so, I’d create a
HashSet<string>. Otherwise, just read it line by line (I’m assuming there’s one string per line) and see whether it matches.152MB of ASCII will end up as over 300MB of Unicode data in memory – but in modern machines have plenty of memory, so keeping the whole lot in a
HashSet<string>will make repeated lookups very fast indeed.The absolute simplest way to do this is probably to use
File.ReadAllLines, although that will create an array which will then be discarded – not great for memory usage, but probably not too bad: