I know creating a short url algorithm isn’t as easy as hashing a URL then chopping the hash down to some incremental version of itself. Even though from an outside perspective that is what it looks to be happening. I’ve read a few articles on the idea, seen a couple in action as well. But none seem to worry about future proofing it.
So I am here trying to find out how I can approach this with PHP and find ways that I can avoid at the least common problems. From database conflicts to whatever else there may be to worry about other than overall storage and database size.
One problem I will definitely face is the service I am creating is taking user-side URL from another service my buddy is creating so on a per user basis we are tracking there short URLs so its possible multiple users could end up using the same exact long url but we will need a different short url id for each user who is supplying a URL. Think of several users sharing a youtube video that recently went viral..
So whats the best tactic at creating a short url algorithm that wont face many bash’s at the same time will allow me to query my DB with a handful of possible short URLs to see if they already exist or not.
Better yet is there some means I can create unique id’s via mySQL functionality, that would in concept loop til one is unique and thusly created for the cause?
I know Im pulling at straws here and this is a rather open question. But I am trying to think tactfully before getting heavy into the build process to only later find out I messed up big. I kinda need some input prior to make sure I am taking a semi sane approach to this.
You can use this short URL algorithm made in PHP – it generates four different “hashes” of the same url.
Create a table like
When user inputs an URL to shorten, you use the function from the article and receive an array of four different hashes. Then you can use a query like:
SELECT id FROM {your_table} WHERE short_url = "{a_hash_from_the_function}"If the query returns no results, then it means that there was no match and you can use this one. If the query returns a result, simply use another hash from the array, see if it exists, and so forth.
Read the whole article as in the bottom the author explains how to make your hashes more unpredictable. I would suggest using a different hashing algorithm than
md5(), but you will have to experiment yourself. 🙂