I have a function that takes two delimited strings and returns the number of common elements. The
The main code of the function is (@intCount is the expected return value)
SET @commonCount = (select count(*) from (
select token from dbo.splitString(@userKeywords, ';')
intersect
select token from dbo.splitString(@itemKeywords, ';')) as total)
where splitString uses a while loop and charIndex to split a string into delimited tokens and inserts it into a table.
The problem I am having is that this only processes at a speed of about 100 rows per second and by the size of my dataset, this will take about 8-10 days to finish.
The size of the two strings may be upto 1500 characters in length.
Is there anyway I can achieve this fast enough to be usable?
The performance problem is probably the combination of a cursor (for the while loop) and the user defined function.
If one of these strings is constant (such as item key words), you can search for each one independently:
Alternatively, a set based approach can work, but you have to normalize the data (plug here for having data in the right format to begin with). That is, you want a table that has:
And another that has
(if there are different types of items. Otherwise this is just a list of keywords.)
Then your query would look like:
And the SQL engine would perform its magic.
Now, how can you create such a list? If you have only a handful of key words per user, then you can do something like:
To get the maximum number of elements in the itemKeyWords, you can use the following query: