I have a very large set of string URL patterns like {http://www.imdb.com, http://www.amazon.com,…} in a list.
I am getting input URL’s like this:
http://www.imdb.com/title/tt1409024/
For the purpose of my application this URL is actually formed from http://www.imdb.com, so the equality of these two should be true.
To implement this, I can extract the base URL from the input URL:
http://www.imdb.com/title/tt1409024/ => http://www.imdb.com
Now I need to compare this extracted URL with the master list of URLs and store the base URL in a database, if a match is found. So in essence, for each on of my input (base) URL’s, I am looking for a match in the master list for the extracted URL, and if a match is found I am storing the input (base) URL in the database.
To implement the equality/matching logic, I have two possible solutions. Please weigh in as to which is better:
- Put the master list of URL’s in an array list, and use the array list
containsmethod - Put the master list in a database, and use query to check the the input url against it
Can anyone tell me which one will be better in terms of performance?
Neither of your suggestions would be appropriate. For an ArrayList, you would have to search linearly through half the list (on average) for every URL you want to check.
For a database (presumably on disk?), you would incur a potentially expensive database lookup for every query.
1000 URL patterns isn’t very many. Keep the list in memory and use an appropriate data structure – a HashSet would do a good job.