I have a table that contains URL strings, i.e.
/A/B/C
/C/E
/C/B/A/R
Each string is split into tokens where the separator in my case is ‘/’. Then I assign integer value to each token and the put them into dictionary (different database table) i.e.
A : 1
B : 2
C : 3
E : 4
D : 5
G : 6
R : 7
My problem is to find those rows in first tables which contain given sequence of tokens. Additional problem is that my input is sequence of ints, i.e. I have
3, 2
and I’d like to find following rows
/A/B/C
/C/B/A/R
How to do this in efficient way. By this I mean how to design proper database structure.
I use PostgreSQL, solution should work well for 2 mln of rows in first table.
To clarify my example – I need both ‘B’ AND ‘C’ to be in the URL. Also ‘B’ and ‘C’ can occur in any order in the URL.
I need efficient SELECT. INSERT does not have to be efficient. I do not have to do all work in SQL if this changes anything.
Thanks in advance
I’m not sure how to do this, but I’m just giving you some idea that might be useful. You already have your initial table. You process is and create the token table:
That’s ok for me. Now, what I would do is to create a new table in which I would match the original table with the tokens of the token table (
OrderedTokens). Something like:This way you can even recreate your original table as long as you use the order field. For example:
The previous query would result in:
So, you don’t even need your original table anymore. If you want to get Urls that have any of the provided token ids (in this case
BORC), you sould use this:This results in:
Now, if you want to get all Urls that have BOTH ids, then try this:
Add in the
countall the ids you want to filter and then equal that count the the amount of ids you added. The previous query will result in:The funny thing is that none of the solutions I provided results in your expected result. So, have I misunderstood your requirements or is the expected result you provided wrong?
Let me know if this is correct.