I would like to implement a mulithtreaded crawler using the single thread crawler code I have now. Basically I read the urls from a text file, take each one and crawl and parse it. I know how thread basics of creating a thread and assigning a process to it but not too sure how to implement in the following way:
I need at least 3 threads and need to assign a url to each thread from a list of urls, and then each needs to go and fetch it and parse it before adding contents to a database.
Dim gthread, tthread, ithread As Thread
gthread = New Thread(AddressOf processUrl)
gthread.Start(url)
tthread = New Thread(AddressOf processUrl))
tthread.Start(url)
ithread = New Thread(AddressOf processUrl))
ithread.Start(url)
WaitUntilAllAreOver:
If gthread.ThreadState = ThreadState.Running Then
Thread.Sleep(5)
GoTo WaitUntilAllAreOver
End If
‘etc..
Now the code maynot make sense but what I need to do is add a unique url to each thread to go process.
Any ideas appreciated
The best way to wait for the
Threadinstances to finish is to call the .Join method. Take the following exampleThough you may want to consider using the
ThreadPoolhere. TheThreadPoolis designed for spawning off lots of small tasks very efficiently.