We’ll soon be embarking on the development of a new mobile application. This particular app will be used for heavy searching of text based fields. Any suggestions from the group at large for what sort of database engine is best suited to allowing these types of searches on a mobile platform?
Specifics include Windows Mobile 6 and we’ll be using the .Net CF. Also some of the text based fields will be anywhere between 35 and 500 characters. The device will operate in two different methods, batch and WiFi. Of course for WiFi we can just submit requests to a full blown DB engine and just fetch results back. This question centres around the ‘batch’ version which will house a database loaded with information on the devices flash/removable storage card.
At any rate, I know SQLCE has some basic indexing but you don’t get into the real fancy ‘full text’ style indexes until you’ve got the full blown version which of course isn’t available on a mobile platform.
An example of what the data would look like:
‘apron carpenter adjustable leather container pocket waist hardware belt’ etc. etc.
I haven’t gotten into the evaluation of any other specific options yet as I figure I’d leverage the experience of this group in order to first point me down some specific avenues.
Any suggestions/tips?
Just recently I had the same issue. Here is what I did:
I created a class to hold just an id and the text for each object (in my case I called it a sku (item number) and a description). This creates a smaller object that uses less memory since it is only used for searching. I’ll still grab the full-blown objects from the database after I find matches.
After this class is created, you can then create an array (I actually used a List in my case) of these objects and use it for searching throughout your application. The initialization of this list takes a bit of time, but you only need to worry about this at start up. Basically just run a query on your database and grab the data you need to create this list.
Once you have a list, you can quickly go through it searching for any words you want. Since it’s a contains, it must also find words within words (e.g. drill would return drill, drillbit, drills etc.). To do this, we wrote a home-grown, unmanaged c# contains function. It takes in a string array of words (so you can search for more than one word… we use it for ‘AND’ searches… the description must contain all words passed in… ‘OR’ is not currently supported in this example). As it searches through the list of words it builds a list of IDs, which are then passed back to the calling function. Once you have a list of IDs, you can easily run a fast query in your database to return the full-blown objects based on a fast indexed ID number. I should mention that we also limit the maximum number of results returned. This could be taken out. It’s just handy if someone types in something like ‘e’ as their search term. That’s going to return a lot of results.
Here’s the example of custom Contains function:
Once you have the list of matching skus, you can iterate through the array and build a query command that only returns the matching skus.
For an idea of performance, here’s what we have found (doing the following steps):
On our mobile units, the entire process takes 2-4 seconds (takes 2 if we hit our match limit before we have searched all items… takes 4 seconds if we have to scan every item).
I’ve also tried doing this without unmanaged code and using String.IndexOf (and tried String.Contains… had same performance as IndexOf as it should). That way was much slower… about 25 seconds.
I’ve also tried using a StreamReader and a file containing lines of [Sku Number]|[Description]. The code was similar to the unmanaged code example. This way took about 15 seconds for an entire scan. Not too bad for speed, but not great. The file and StreamReader method has one advantage over the way I showed you though. The file can be created ahead of time. The way I showed you requires the memory and the initial time to load the List when the application starts up. For our 171,000 items, this takes about 2 minutes. If you can afford to wait for that initial load each time the app starts up (which can be done on a separate thread of course), then searching this way is the fastest way (that I’ve found at least).
Hope that helps.
PS – Thanks to Dolch for helping with some of the unmanaged code.