I heard like for mapreduce jobs input need not in HDFS. It can be

Question

0

Asked: May 28, 20262026-05-28T06:20:38+00:00 2026-05-28T06:20:38+00:00

I heard like for mapreduce jobs input need not in HDFS. It can be

0

I heard like for mapreduce jobs input need not in HDFS. It can be on other file system.. Can someone please provide me more inputs on this..

I am litle confused on this? In standalone mode, data can be on local file system. But in cluster mode how can we point to mapreduce jobs to some other file system?

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-05-28T06:20:39+00:00

No it does not need to be in HDFS. For instance jobs which target HBase using its TableInputFormat pull records over the network from HBase nodes as inputs to its map jobs. The DbInputFormat can be used to pull data from a SQL database into a job. You could build an input format that did something like read data off of an NFS mount.

In practice you want to avoid pulling data over the network if you can. MR performance is much better if you can have your data locally on the nodes where the job is being run since Disk Throughput > Network Throughput.

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

I heard like for mapreduce jobs input need not in HDFS. It can be

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply