I have a MySQL table in which a column contains string prefixes. For instance these prefixes could be top-level directories on an Unix file system:
my_table:
+---------+
| prefix |
+---------+
| /usr/ |
| /bin/ |
| /var/ |
| /lib/ |
+---------+
How can I write a query that efficiently finds all rows in this table where the value of the prefix column is the beginning of a given string?
For instance given the string ‘/usr/bin/cat’ how can I write a query that finds the row containing ‘/usr/’ which is the beginning of ‘/usr/bin/cat’.
My first guess is to use LIKE this way:
SELECT * FROM my_table
WHERE '/usr/bin/cat' LIKE CONCAT(prefix, '%')
But I’m afraid this query won’t be using the index I have on the prefix column.
I also came up with the following:
SELECT * FROM my_table
WHERE prefix <= '/usr/bin/cat' ORDER BY prefix DESC LIMIT 1
Which retrieves the prefix equal to or immediately preceding ‘/usr/bin/cat’ in lexicographical order. I can then verify whether that prefix actually begins with ‘/usr/bin/cat’ or not.
But that only works with a single row and I wonder if that’s the optimal solution.
Edit: I used root directories as an example but I’d like to know if there’s a way to deal with arbitrary strings as well. Perhaps these strings won’t contain path separators or the prefix could be several level deep. Say: ‘/usr/lib’.
Edit: It seems that my second query is bogus. ‘/usr/’ is smaller than ‘/usr/bin/cat’ but so is ‘/usr/a’. That query is still much faster than a full table scan on a large table but to make it work I have to fetch more rows and go through them until I find the first actual prefix.
So it seems an index can help in this kind of prefix search but I still don’t know the best way to take advantage of it.
— situation: We do not know where the string can be cut.
— But we must know maximal length of the prefix.
— EDIT: It would also help to know the minimal length of prefix – to eliminate lots of false positives that we do not want to find. (min = 2 characters).
— This will definitely use the index: in this example it is max.8 characters. x = 8
— in your application, just try to generate such SQL query:
— No full table scan,just (x – min +1) times uses the index. Hopefully this will be FAST enough! 🙂