I know that we can call a map-reduce job from a normal java application.

Question

0

Asked: June 13, 20262026-06-13T18:43:16+00:00 2026-06-13T18:43:16+00:00

I know that we can call a map-reduce job from a normal java application.

0

I know that we can call a map-reduce job from a normal java application. Now the map-reduce jobs in my case has to deal with files on hdfs and also files on other filesystem. Is it possible in hadoop that we can access files from other file system while simultaneously using the files on hdfs. Is that possible ?

So basically my intention is that I have one large file which I want to put it in HDFS for parallel computing and then compare the blocks of this file with some other files(which I do not want to put in HDFS coz they need to be accessed as full length file at once.

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-06-13T18:43:18+00:00

You can use the distributed cache to distribute the files to your mappers, they can open and read the files in their configure() method (don’t read them in map() as it will be called many times.)

edit

In order to access file from the local filesystem in your map reduce job, you can add those files to the distributed cache when you setup your job configuration.

JobConf job = new JobConf();
DistributedCache.addCacheFile(new URI("/myapp/lookup.dat#lookup.dat"), job);

The MapReduce framework will make sure those files are accessible by your mappers.

public void configure(JobConf job) {
    // Get the cached archives/files
    Path[] localFiles = DistributedCache.getLocalCacheFiles(job);

    // open, read and store for use in the map phase.
}

and remove the files when your job is done.

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

I know that we can call a map-reduce job from a normal java application.

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply