While making some final tests of a class-library that I’m writing for Windows Mobile
(using Compact Net Framework 2.0), I ran into an OOM-exception.
Basically, my library loads first a dictionary-file (an ordinary text file with a word list) and thereafter another file based upon the dictionary (I call it KeyMap) which size is more or less the same of the previously loaded dictionary.
Everything worked fine (using the emulator and my real device) with above files until I tried to load a Spanish-dictionary which has a size of approximately 2.7MB. The other language dictionaries I have used so far without any OOM-exceptions amounts to approximately 1.8MB each. With the Spanish dictionary, I can load the first file without any problems but when I try to read the second file, I get the OOM-error.
Below I have written the code that I am using. Basically I read the files and assign its contents to a string-variable (DictData and TextKeyMap). Then I make a Split on the string-variable to pass on the contents to a string-array (Dict and KeyMap).
'Loading Dictionary works
Dim ReadDictionary As StreamReader = New StreamReader(DictPath, Encoding.UTF8)
DictData = ReadDictionary.ReadToEnd()
ReadDictionary.Close()
Dict = DictData.ToString.ToUpper.Split(mySplitSep.ToCharArray) 'mySplitSep=chr(10)
DictData = "" 'perhaps "nothing" is better
'Loading KeyMap gives me error
Dim ReadHashKeyMap As StreamReader = New StreamReader(HashKeyMapPath, Encoding.UTF8)
TextKeyMap = ReadHashKeyMap.ReadToEnd() '<-- OOM-error
ReadHashKeyMap.Close()
KeyMap = TextKeyMap.ToString.Split(mySplitSep.ToCharArray) 'mySplitSep=chr(10)
TextKeyMap = "" 'perhaps "nothing" is better
I am a hobby-programmer with no expert-knowledge so my code shown above can probably be
improved. Instead of using ReadToEnd, I tried to read each line in a For-loop but I got
the same error (it was also slower).
I presume the error is due to the limitation of 32MB of contiguous memory in Windows Mobile.
Anyone of you who can help me out, perhaps by suggesting some alternative solutions? Maybe
the problem is due to my crappy code shown above? What about, loading the second file in
another thread? Could this work?
All help I can get will be highly appreciated.
Edit: I asked a similar question some time ago (here) but that one was more related to dealing with the reception of bytes and was resolved using chunks. In this case, I am dealing with strings.
Edit2: This library is a spellchecking-library. It works quite well and implements some quite advance techniques such as Soundex- and DoubleMetaPhone-algorithms. The only major problem so far is the problem mentioned above with a huge text-file for Spanish. Other dictionaries are OK. For more info, please see this link
As you’ve not said what you’re using this file for I’m assuming that you are just searching for a word for some reason.
First of all, its probably not a good idea to try and load the complete file into memory. Instead, it might more productive to search the file for the data (word) you need and also, perhaps, keep some sort of indexing information in memory to speed things up a bit.
As the data you are trying to search is just a list of words it might be a good idea to scan the file and record in a dictionary where the first letter of a word changes. e.g A’s start at line 0; B’s start at line 200; C’s start at line 300 etc. Use these two pieces of information to populate your dictionary; the letter is the key and the line number is the value. In effect, the dictionary becomes a high level index into the word list file. This dictionary is also very small.
Then, when you start to search for a word, use the first letter of the word to search the dictionary. This will get you the line number where words that begin with that letter are located in the file. Armed with the line number (re)open the file and go straight to that line in the word file by moving stream pointer to the target line. Then search for the target word from there. Either search sequentially, a line at a time (not recommended it will be quite slow but will be easier to code). Or, search for the word using a binary chop (much quicker, but harder to code). Although for the latter you’ll also need to know where the words that start with the target letter stop in the file as you’ll be search a section of the file. I’d also recommend that you do the word searching in the file rather than load all those words into memory, otherwise you might be back to where you start with OOM errors.
If you’re not sure of anything, stick a comment on here and I’ll do my best to answer it.
Good luck