What I want to achieve: I have set up a Spring Batch Job containing

Question

0

Asked: June 11, 20262026-06-11T21:48:51+00:00 2026-06-11T21:48:51+00:00

What I want to achieve: I have set up a Spring Batch Job containing

0

What I want to achieve:

I have set up a Spring Batch Job containing Hadoop Tasks to process some larger files.
To get multiple Reducers running for the job, i need to set the number of Reducers with setNumOfReduceTasks. I’m trying to set this via the JobFactorybean.

My bean configuration in classpath:/META-INF/spring/batch-common.xml :

<?xml version="1.0" encoding="UTF-8"?>
<beans xmlns="http://www.springframework.org/schema/beans"
    xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
    xmlns:p="http://www.springframework.org/schema/p"
    xsi:schemaLocation="http://www.springframework.org/schema/beans http://www.springframework.org/schema/beans/spring-beans.xsd">

    <bean id="jobFactoryBean" class="org.springframework.data.hadoop.mapreduce.JobFactoryBean" p:numberReducers="5"/>
    <bean id="jobRepository" class="org.springframework.batch.core.repository.support.MapJobRepositoryFactoryBean" />
    <bean id="transactionManager" class="org.springframework.batch.support.transaction.ResourcelessTransactionManager"/>
    <bean id="jobLauncher" class="org.springframework.batch.core.launch.support.SimpleJobLauncher" p:jobRepository-ref="jobRepository" />
</beans>

The XML is included via:

    <?xml version="1.0" encoding="UTF-8"?>
<beans xmlns="http://www.springframework.org/schema/beans"
    xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" 
    xmlns:context="http://www.springframework.org/schema/context"
    xsi:schemaLocation="
        http://www.springframework.org/schema/beans http://www.springframework.org/schema/beans/spring-beans-3.0.xsd
        http://www.springframework.org/schema/context http://www.springframework.org/schema/context/spring-context-3.0.xsd">

    <context:property-placeholder location="classpath:batch.properties,classpath:hadoop.properties"
            ignore-resource-not-found="true" ignore-unresolvable="true" />


    <import resource="classpath:/META-INF/spring/batch-common.xml" />
    <import resource="classpath:/META-INF/spring/hadoop-context.xml" />
    <import resource="classpath:/META-INF/spring/sort-context.xml" />

</beans>

I’m getting the beans for the jUnit Test via

    JobLauncher launcher = ctx.getBean(JobLauncher.class);
    Map<String, Job> jobs = ctx.getBeansOfType(Job.class);
    JobFactoryBean jfb = ctx.getBean(JobFactoryBean.class);

The jUnit Test stops with a error:

No bean named '&jobFactoryBean' is defined

So: the JobFactoryBean is not loaded, but the others are loaded correctly and without an error.

Without the line

JobFactoryBean jfb = ctx.getBean(JobFactoryBean.class);

the project tests runs, but there is just one Reducer per job.

The method

ctx.getBean("jobFactoryBean");

returns a Hadoop Job. I would expect to get the factoryBean there…

To test it I have extended the constructor of the Reducer to log each creation of a Reducer to get a notification when one is generated. So far I just get one entry in the log.

I have a 2 VM’s with 2 assigned cores and 2 GB ram each, and I’m trying o sort a 75MB file consisting of multiple books from Project Gutenberg.

EDIT:

Another thing i have tried is to set the number of the reducers in the hadoop job via the property, without a result.

<job id="search-jobSherlockOk" input-path="${sherlock.input.path}"
    output-path="${sherlockOK.output.path}"
    mapper="com.romediusweiss.hadoopSort.mapReduce.SortMapperWords"
    reducer="com.romediusweiss.hadoopSort.mapReduce.SortBlockReducer"
    partitioner="com.romediusweiss.hadoopSort.mapReduce.SortPartitioner"
    number-reducers="2"
    validate-paths="false" />

the settings in the mapreduce-site.xml are on both nodes:

<property>
        <name>mapred.tasktracker.reduce.tasks.maximum</name>
        <value>10</value>
</property>

…and Why:

I want to copy the example of the following blog post:
http://www.philippeadjiman.com/blog/2009/12/20/hadoop-tutorial-series-issue-2-getting-started-with-customized-partitioning/

I need different Reducers on the same machine or a fully distributed environment to test the behaviour of the Partitioner. The first approach would be easier.

P.s.: could a user with a higher reputation create a tag “spring-data-hadoop” Thank you!