I’ve stuck with one quite tricky problem.
I have list of products from different warehouses, where each product have: Brand and Model plus some extra details. Model could be quite different from different warehouses for the same product, but Brand is always the same.
All list of products I store in one table, let’s say it will be Product table.
Then I have another table – Model, with CORRECT Model Name, Brand and additional details like image, description etc. Plus I have keywords column where I try to add all keywords manually.
And here is the problem, I need to associate each product that I receive from warehouse with one record from my Model table. Right now I’m using full text search in boolean mode, but that’s quite painful and does not work very well. I need to do a lot of manual work.
Here are just few examples of names that I have:
- WINT.SPORT3D
- WINT.SPORT3D XL
- WINT.SPORT 3D
- WINT.SPORT3D MO
- WINTER SPORT 3D
The correct name for all of these items would be: WINTER SPORT 3D, so they should all be assigned to the same model.
So, is there any way to improve full text search or some other technique to solve my problem?
Database that I’m using is MySQL, I would prefer not to change it.
I’ll start by putting together a more formal definition of the tables:
Here I’d using local_id as a foreign key to your ‘Model’ table – but to avoid further confusion, I’ll call it ‘local’
It seems like the table you describe as ‘product’ is redundant.
Obviously until the data is cross referenced, local_id will be null. But after it is populated it won’t have to change, and given a warehouse_id, a band and a product, you can find your local descriptor easily:
So all you need to do is populate the links. Soundex is a rather crude tool – a better solution for this would be the Levenstein distance algorithm. There’s a mysql implementation here
Given a set of rows in the warehouse table which need to be populated:
…for each row identify the best match as (using the values from the previous query as w.*)….
But this will find the best match, even if the 2 strings are completely different! Hence….
…requires at least half the string to match.
So this can be implemented in a single update statement: