I built a web crawler but it is single threaded. Now I am extending

Question

0

Asked: June 12, 20262026-06-12T08:04:15+00:00 2026-06-12T08:04:15+00:00

I built a web crawler but it is single threaded. Now I am extending

0

I built a web crawler but it is single threaded. Now I am extending it to work with multiple threads. I am not able to understand the following :

How many threads should I create? Should it be a fixed number or a dynamic one changing according to the length of the Queue holding the URIs? (Taking into consideration the available memory also)
I have created a new class for the thread through the Runnable Interface and I want each thread’s run method to access an object I created in my Main class which is calling thread.start(). How should I access this object from each thread?

I am using NetBeans.

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-06-12T08:04:16+00:00

You’re definitely going to want concurrency with a web a crawler 🙂

And you’re probably going to want to set up a thread pool so that you can reuse threads and not bite the cost of instantiating new threads with each task.

The thread pool options that you have are a FixedThreadPool and a CachedThreadPool. the benefits of each of these are explained in detail in the Java Concurrency Tutorial. The big drawback of the CachedThreadPool is that there’s no limit on how many threads can be created; in the event that a very large number of threads are added to the pool, you might see some significant performance degradation or timeouts (if you have a socket timeout defined).

In either case, the best practice for setting up thread pools is through java.util.concurrent.Executors

It’s just a matter of creating an ExecutorService by calling one of the following:

ExecutorService threadPool = Executors.newCachedThreadPool();
ExecutorService threadPool = Executors.newFixedThreadPool(500);

Once you have the threadpool, you can either invoke a single runnable (which doesn’t return a response) or a callable (which does) by using the submit() method.

You can also run .invokeAll() if you’re using callables to generate futures:

futures = cachedThreadPool.invokeAll(tasks,
                                     timeout,
                                     TimeUnit.MILLISECONDS);

And then get the results:

for (Future f: futures) {
   someList.add(f.get())
}

If you want multiple threads to be able to modify the same object, you’ll either need to use the synchronized keyword in the setters or use thread-safe data types.

Hope this helps. Good luck!!

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

I built a web crawler but it is single threaded. Now I am extending

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply