I need to read 200,000 or so records from a website and store them in DB. The application is a desktop app implemented on top of Netbeans Rich Client Platform. By using Apache HttpComponent library, I can send request to the website and retrieve the response that contains the record information; then using regex, I can fairly easily extract the dozen of fields that I need from the HTML.
I am thinking to have 2 worker threads besides the GUI thread. One worker thread handles the HTTP request/response part and also extracts the record from the HTML using regex; while the other worker thread stores the records into DB. So, there will be a data structure to hold the records so that it can be shared between the two worker threads. I am also considering to have a buffer of size 100 (for example) for the HTTP worker thread to store the records, and when the buffer is full, transfer 100 records at one time to the shared records holder.
Please comment on my design and also my questions are:
- what is the proper data structure to hold the records?
- how to synchronized it between the two worker threads?
- how would the multi-threads be implemented in the modular system of Netbeans Platform?
Depends on the data. Probably a simple class with a bunch of fields (preferably immutable to make using multiple threads safer).
One of the BlockingQueue implementations might be good for that. ArrayBlockingQueue can be used as a fixed-size buffer for passing work between the threads.
No idea whether NetBeans Platform has anything to say about that. Launching your own threads should work.