I have a requirement to write a batch job that fetches rows from a database table and based on a certain conditions, write to other tables or update this row with a certain value. We are using spring and jdbc to fetch the result set and iterate through and process the records using a standalone java program that is scheduled to run weekly. I know this is not the right way to do it, but we had to do it as a temporary solution. As the records grow into millions, we will end up with out of memory exceptions, so I know this is not the best approach.
Can any of you recommend what is the best way to deal with such a situation?
Use Threads and fetch 1000 records per thread and process them in parallel?
(OR)
Use any other batch mechanism to do this (i know there is spring-batch but have never used this)
(OR)
Any other ideas?
This sounds like the sort of thing you should do inside the database. For example, to fetch a particular row and update it based on certain conditions, SQL has the
UPDATE ... WHERE ...statement. To write to another table, you can useINSERT ... SELECT ....These may get fairly complicated, but I suggest doing everything in your power to do this inside the database, since pulling the data out to filter it is incredibly slow and defeats the purpose of having a relational database.
Note: Make sure to experiment with this on a non-production system first, and implement any limits you need so you don’t lock up production tables at bad times.