I’m a little confused on how the Hadoop Distributed File System is set up

Question

0

Asked: June 7, 20262026-06-07T04:17:24+00:00 2026-06-07T04:17:24+00:00

I’m a little confused on how the Hadoop Distributed File System is set up

0

I’m a little confused on how the Hadoop Distributed File System is set up and how my particular setup affects it. I used this guide to set it up http://www.michael-noll.com/tutorials/running-hadoop-on-ubuntu-linux-multi-node-cluster/ using two Virtual Machines on Virtual Box and have run the example (just a simple word count with txt file input). So far, I know that the datanode manages and retrieves the files on its node, while the tasktracker analyzes the data.

1) When you use the command -copyFromLocal, are you are copying files/input to the HDFS? Does Hadoop know how to divide the information between the slaves/master, and how does it do it?

2) In the configuration outlined in the guide linked above, are there technically two slaves (the master acts as both the master and a slave)? Is this common or is the master machine usually only given jobtracker/namenode tasks?

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-06-07T04:17:27+00:00

1)

The client connects to the name node to register a new file in HDFS.
The name node creates some metadata about the file (either using the default block size, or a configured value for the file)
For each block of data to be written, the client queries the name node for a block ID and list of destination datanodes to write the data to. Data is then written to each of the datanodes.

There is some more information in the Javadoc for org.apache.hadoop.hdfs.DFSClient.DFSOutputStream

2) Some production systems will be configured to make the master its own dedicated node (allowing the maximum possible memory allocation, and to avoid CPU contention), but if you have a smaller cluster, then a node which contains a name node and data node is acceptable

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

I’m a little confused on how the Hadoop Distributed File System is set up

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply