I have a big MySQL InnoDB table (about 1 milion records, increase by 300K weekly) let’s say with blog posts. This table has an url field with index.
By adding new records in it I’m checking for existent records with the same url. Here is how query looks like:
SELECT COUNT(*) FROM `tablename` WHERE url='http://www.google.com/';
Currently system produces about 10-20 queries per second and this amount will be increased. I’m thinking about improving performance by adding additional field which is MD5 hash of the URL.
SELECT COUNT(*) FROM `tablename` WHERE md5url=MD5('http://www.google.com/');
So it will be shorter and with constant length which is better for index compared to URL field. What do you guys think about it. Does it make sense?
Another suggestion by friend of mine is to use CRC32 instead of MD5, but I’m not sure about how unique will be result of CRC32. Let me know what you think about CRC32 for this role.
UPDATE: the URL column is unique for each row.
Create a non-clustered index on URL. That will let your SQL engine do all the optimization internally and will produce the best results!
If you create an index on a VARCHAR column, SQL will create a hash internally anyway and using the index can give better performance by an order of magnitude or even more!
Also, something to keep in mind if you’re only checking whether a URL exists, is that certain SQL products will produce faster results with a query like this: