I am in the process of migrating data from one database to another.
The data I am migrating is reviews for versions of products.
There are many versions for each review. there are 23K distinct reviews, and 60k versions that have reviews meaning roughly every 3 versions share a review.
In my Java application the host database contains the Versions that have a reviewId that is associated with a review on the review db.
I have a Hashmap<Integer, Integer>, and every time I import a review from the review db I add it to the map using map.put(reviewId, hostId).
Before I import from the review db I check to see if its in the hashmap if it is I use the already imported review. This starts to get really slow after a while, and I am wondering if perhaps using a temp table is more efficient. Or if there is another way that is more efficient.
Here is the code:
https://gist.github.com/4064373
Thoughts, suggestions?
A MySQL temporary table will in the best case be implemented as a hash map as well. But in contrast to the Java hash map, the implementation will be prepared to handle a larger number of columns. And you have the overhead of communicating with MySQL. So to answer the title of your question, I’d expect a Java HashMap to be more efficient if you’re accessing your data from the application. For correlation of data within the SQL server, things are different.
But as Jon Skeet pointed out in his comment, a simple hash map from integers to integers should not be a serious performance bottleneck for the kind of application you’re describing. So chances are that something else is written in a suboptimal way. I don’t see any obvious problems at first glance, but then, there is a lot of methods you call, and in theory, any one of them might be to blame.