The title might be a bit misleading, but I couldn’t figure out a better title. I’m writing a simple search engine which will search on several sites for the specific domain. To be concrete: I’m writing a search engine for hardstyle livesets/aftermovies/tracks. To do I will search on the sites who provide livesets, tracks, and such. The problem here is speed, I need to pass the search query to 5-7 sites, get the results and then use my own algorithm to display the results in a sorted order. I could just “multithread” it, but it’s easier said then done so I have a few questions.
-
What would be the best solution to this problem? Should I just multithread/process this application, so I’m going to get a bit of speed-up?
-
Are there any other solutions or I am doing something really wrong?
Thanks,
William van Doorn
Unless you’re trying to learn multithreading, avoid writing the infrastructure for this yourself. Synchronizing lots of tasks that could take different times, handling failures, etc., it’s a mess.
For largely parallelizable tasks (such as querying multiple sites, combining results, etc.), you may want to look at existing infrastructures.
Map/reduce frameworks (such as Hadoop for Java) can handle some of this for you, letting you focus on the logic of your application.