Lets say I have three models/tables: operating_systems, words, and programming_languages:
# operating_systems
name:string created_by:string family:string
Windows Microsoft MS-DOS
Mac OS X Apple UNIX
Linux Linus Torvalds UNIX
UNIX AT&T UNIX
# words
word:string defenitions:string
window (serialized hash of defenitions)
hello (serialized hash of defenitions)
UNIX (serialized hash of defenitions)
# programming_languages
name:string created_by:string example_code:text
C++ Bjarne Stroustrup #include <iostream> etc...
HelloWorld Jeff Skeet h
AnotherOne Jon Atwood imports 'SORULEZ.cs' etc...
When a user searches hello, the system shows the defenitions of ‘hello’. This is relatively easy to implement. However, when a user searches UNIX, the engine must choose: word or operating_system. Also, when a user searches windows (small letter ‘w’), the engine chooses word, but should also show Assuming 'windows' is a word. Use as an <a href="etc..">operating system</a> instead.
Can anyone point me in the right direction with parsing and choosing the topic of the search query? Thanks.
Note: it doesn’t need to be able to perform calculations as WA can do.
Have a new index table called
termsthat contains a tokenised version of each valid term. That way, you only have to search one table.Then you can see how close a match the users search term is. I.e. “Windows” would be a 100% match with
2– so assume that, but a close match to1also, so suggest that as an alternative. You’ve have to write your own rules engine that decided how close a word matches (i.e. what gets assumed with “windows” vs “Windows”?) ThePriorityfield could be the final decider if the rules engine can’t decide, and could in theory be driven by user activity so it learns what users are more likely referring to.