I wrote a Java program to add and retrieve data from an MS Access. At present it goes sequentially through ~200K insert queries in ~3 minutes, which I think is slow. I plan to rewrite it using threads with 3-4 threads handling different parts of the hundred thousands records. I have a compound question:
-
Will this help speed up the program because of the divided workload or would it be the same because the threads still have to access the database sequentially?
-
What strategy do you think would speed up this process (except for query optimization which I already did in addition to using Java’s preparedStatement)
First, don’t use Access. Move your data anywhere else — SQL/Server — MySQL — anything. The DB engine inside access (called Jet) is pitifully slow. It’s not a real database; it’s for personal projects that involve small amounts of data. It doesn’t scale at all.
Second, threads rarely help.
The JDBC-to-Database connection is a process-wide resource. All threads share the one connection.
‘But wait,’ you say, ‘I’ll create a unique Connection object in each thread.’
Noble, but sometimes doomed to failure. Why? Operating System processing between your JVM and the database may involve a socket that’s a single, process-wide resource, shared by all your threads.
If you have a single OS-level I/O resource that’s shared across all threads, you won’t see much improvement. In this case, the ODBC connection is one bottleneck. And MS-Access is the other.