I’m creating an app that will import products from several XML feeds. In the XML there is a category specified, like T-Shirt for instance. The problem is that different resellers specify the categories differently. For instance, what one reseller calls “T-Shirts” another may call “T-Shirt”, a third “short sleeved shirts” and so on.
I want to somehow map these categories to the categories I have myself. So I need some tips on how I should structure my database.
The idea I have is to create a “raw_categories” table which contains the name of the resellers category and a “category_id” which has a belongs_to relationship to my own “categories” table. Then when I import I simply try to find a raw_category which has a matching name and if there is one, pick it, otherwise add a new one. This new one I can then manually relate to one of my own categories.
Do you understand how I mean, and is it a good approach? Is there a better/more efficient way?
If this is a good idea. How do I do it in Rails? Should I use something like this (I think I’ve seen something like this in the API doc):
# products model
has_one :category, :through => :raw_categories
I estimate that there will be about 40k to 100k products in the database.
Regards
Linus
Yes, that is the typical design. I would usually call your base table category, and call this alias table categoryAlias. I’m picky about verbage, but raw_categories has nothing to do with “rawness”, it’s just the categories you want to use.
The other thing I’d suggest is that when you create a category you also create a categoryAlias.
Considering the amount of data you will have in these categories, one practical suggestion I can offer is that when you create a category, also create a categoryAlias row for it, with the same name.
This will make your import code easier as you will only need to query categoryAlias to determine if a category already exists, or there is an alias for it.