I am working on a content rewriter, basically it will replace words with their synonyms.
I have the synonms in a mySQL database, the table contains 3 columns
id int(11)
keyword varchar(50)
synonyms varchar(255)
Entries looks like this:
50 slake abate,slack,decrease,lessen,minify
51 abate slake,slack,decrease,lessen,minify
52 slack slake,abate,decrease,lessen,minify
53 decrease slake,abate,slack,lessen,minify
54 lessen slake,abate,slack,decrease,minify
55 minify slake,abate,slack,decrease,lessen
So my first idea was to first get every word in the text to rewrite (ignoring some keywords in a blacklist), and then making a sql query to see if a synonym for that word exists in the database. But if I have a text with 1000 words, would 1000 sql queries be too much? Also some of the synonyms have 2 words (like “throw away”), so I could end up having to do a lot more queries than word in the text.
Is there a better way to achieve this?
Wouldn’t this be better modelled as as normalised schema:
The synonyms for a word are then, for instance:
Create an indexes on WordTable(Word), SynonymTable(WordId) and SynonymTable(SynonymId)
There are several reasons for using this approach: