I built a web crawler but it is single threaded. Now I am extending it to work with multiple threads. I am not able to understand the following :
- How many threads should I create? Should it be a fixed number or a dynamic one changing according to the length of the Queue holding the URIs? (Taking into consideration the available memory also)
- I have created a new class for the thread through the Runnable Interface and I want each thread’s
runmethod to access an object I created in my Main class which is callingthread.start(). How should I access this object from each thread?
I am using NetBeans.
You’re definitely going to want concurrency with a web a crawler 🙂
And you’re probably going to want to set up a thread pool so that you can reuse threads and not bite the cost of instantiating new threads with each task.
The thread pool options that you have are a FixedThreadPool and a CachedThreadPool. the benefits of each of these are explained in detail in the Java Concurrency Tutorial. The big drawback of the CachedThreadPool is that there’s no limit on how many threads can be created; in the event that a very large number of threads are added to the pool, you might see some significant performance degradation or timeouts (if you have a socket timeout defined).
In either case, the best practice for setting up thread pools is through java.util.concurrent.Executors
It’s just a matter of creating an ExecutorService by calling one of the following:
Once you have the threadpool, you can either invoke a single runnable (which doesn’t return a response) or a callable (which does) by using the submit() method.
You can also run .invokeAll() if you’re using callables to generate futures:
And then get the results:
If you want multiple threads to be able to modify the same object, you’ll either need to use the synchronized keyword in the setters or use thread-safe data types.
Hope this helps. Good luck!!