I’m having an issue bending my head around this one.
I have a table with the following structure. Contains about 5 million rows.
Id bigint primary identity, auto increment
SKU int
Keyword nvarchar(200)
KeywordType nvarchar(1)
The table is broken down into all possible keywords, in multiple languages for a given SKU. Thus for example, a Lord of the Rings product may have 100 records due to the different acceptable keywords but all the same SKU.
Ignore KeywordType for now.
Issue #1: How can I write a SQL query to return records based on an input such as “Lord Rings” ?
Issue #2: The KeywordType field is a weird one. Its to be used to filter records based on the format, eg CD, DVD, etc. Thus a KeywordType value of “X” for a given result set of SKU’s is to be further filtered by its value.
Example, user is searching “Lord Rings” with a DVD filter.
I need results as from issue #1 and also only those with a Keyword of “DVD” AND KeywordType “X”.
Finally, I’m looking for an ANDed solution.
Thanks. Hope someone can help…
Here is some sample data for a particular SKU for Lord of the Rings The Two Towers
650446 12288 DVD F
650452 12288 LORD T
650453 12288 LTD X
650454 12288 MOVIE A
650455 12288 OF T
650457 12288 RINGS T
650460 12288 THE T
650461 12288 TOURS X
650462 12288 TOWERS T
650463 12288 TWO T
If the user inputs “Lord Rings” then I would expect to get the above SKU returned in the search results.
The question is a bit confusing but I think you need:
1) A way to parse the user input (eg “Lord Ring”) into individual keywords (eg (‘Lord’, ‘Ring’)). This would preferably be done at the level of the application, but can be done in SQL / PSQL / TSQL i.e. most any flavor of SQL.
2) A SQL query like this (derived from Daren Schwenke’s solution)
Note: the effectiveness of this structure for performing what is essentially a form of fulltext search leaves much to be desired. The situation can be helped by introducing the right indexes at a glance we may need at a minimal
– (Keyword, SKU)
– (KeywordType, Keyword, SKU)
A few other things could help, for example excluding several “noise words” such as “OF”, “THE”, “A”, “TO” from the index (and of course from the search criteria supplied by the end-users)
But on the whole, it may be a good idea to assess the wisdom of proceeding with this structure; it may make sense with the specific application at hand, the OP is the only one to know this…