I am using spring batch to process multiple files using MultiResourcePartitioner and all the

Question

0

Asked: May 26, 20262026-05-26T15:47:35+00:00 2026-05-26T15:47:35+00:00

I am using spring batch to process multiple files using MultiResourcePartitioner and all the

0

I am using spring batch to process multiple files using MultiResourcePartitioner and all the itemreader and writers are in step scope.Each step runs individual files and commits to database at interval of 1000. when there is any error during current processing, all the previous commits needs to be roll backed and the step will fail . Thus the file contents are not added to the database.

Which is the best way among these:

Using Transaction Propogation as NESTED.
Setting commit interval in chunk with Integer.MAXVALUE , this will not work as the
file have large items and fail with heap space.
any other way to have transaction at the step level.

I have the sample xml file shown below:

<bean id="filepartitioner" class="org.springframework.batch.core.partition.support.MultiResourcePartitioner">
    <property name="resources" value="classpath:${filepath}" />
</bean>

<bean id="fileItemReader" scope="step" autowire-candidate="false" parent="itemReaderParent">
        <property name="resource" value="#{stepExecutionContext[fileName]}" />
</bean>

<step id="step1" xmlns="http://www.springframework.org/schema/batch">
    <tasklet transaction-manager="ratransactionManager"   >
        <chunk writer="jdbcItenWriter" reader="fileItemReader" processor="itemProcessor" commit-interval="800" retry-limit="3">
         <retryable-exception-classes>
        <include class="org.springframework.dao.DeadlockLoserDataAccessException"/>
     </retryable-exception-classes>
    </chunk>
    <listeners>
        <listener ref="customStepExecutionListener">
        </listener>
    </listeners>
    </tasklet>
    <fail on="FAILED"/>
</step>

UPDATES:

It seems that the main table (where direct insert happens) is referred by other tables and materialized views . if i delete the data in this table to remove stale records using processed column indicator , the data spooled using MV will show old data. i think staging table is needed for my requirement.

To implement staging data table for this requirement

Create another parallel step to poll database and write the data whose processed column value is Y.
Transfer data at the end of each successful file completion using step listener (afterStep method).

or any other suggestions.

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-05-26T15:47:36+00:00

In general I agree with @MichaelLange approach. But perhaps separate table is too much… You can have additional column completed in your import table, which if set to “false” then the record belongs to file which is being processing now (or failed processing). After you’ve processed the file you issue a simple update for this table (should not fail as you don’t have any constraints on this column):

update import_table set completed = true where file_name = "file001_chunk1.txt"

Before processing a file you should remove “stale” records:

delete from import_table where file_name = "file001_chunk1.txt"

This solution would be faster and easier to implement then nested transactions. Perhaps with this approach you will face table locks but with appropriate selection of isolation level this can be minimised. Optionally you may wish to create a view over this table to filter out the non-completed records (enable index on completed column):

create view import_view as select a, b, c from import_table where completed = true

In general I think nested transactions are not possible in this case, as chunks can be processed in parallel threads, each thread holding it’s own transaction context. The transaction manager will not be able to start a nested transaction in new thread, even if you somehow manage to create a “main transaction” in “top” job thread.

Yet another approach is the continuation of the “temporary table”. What the import process should do is to create import tables and name them according to e.g. date:

import_table_2011_10_01
import_table_2011_10_02
import_table_2011_10_05
...
etc

and a “super-veiw” that joins all these tables:

create view import_table as
select * from import_table_2011_10_01
union
select * from import_table_2011_10_02
union
select * from import_table_2011_10_05

After the import succeeded, the “super-view” should be re-created.

With this approach you will have difficulties with foreign keys for import table.

Yet another approach is to use a separate DB for import and then feed the imported data from the import DB to main (e.g. transfer the binary data).

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

I am using spring batch to process multiple files using MultiResourcePartitioner and all the

Which is the best way among these:

UPDATES:

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply