I was looking for a Disk intensive Hadoop application to test the I/O activity

Question

0

Asked: June 15, 20262026-06-15T05:48:49+00:00 2026-06-15T05:48:49+00:00

I was looking for a Disk intensive Hadoop application to test the I/O activity

0

I was looking for a Disk intensive Hadoop application to test the I/O activity in Hadoop but I couldn’t find any such application which kept the Disk utilization above, say 50% or some such application which actually keeps disk busy. I tried randomwriter, but that surprisingly is not disk I/o intensive.

So, I wrote a tiny program to create a file in Mapper and write some text into it. This application works well, but the utilization is high only in the master node which is also name node, job tracker and one of the slaves. The disk utilization is NIL or negligible in the other task trackers. I’m unable to understand why disk I/O is so low in task trackers. Could anyone please nudge me in right direction if I’m doing something wrong? Thanks in advance.

Here is my sample code segment that I wrote in WordCount.java file to create and write UTF string into a file-

Configuration conf = new Configuration();
FileSystem fs = FileSystem.get(conf);
Path outFile;
while (itr.hasMoreTokens()) {
    word.set(itr.nextToken());
    context.write(word, one);
    outFile = new Path("./dummy"+ context.getTaskAttemptID());
    FSDataOutputStream out = fs.create(outFile);

    out.writeUTF("helloworld");
    out.close();
    fs.delete(outFile);
  }

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-06-15T05:48:50+00:00

Editorial Team

2026-06-15T05:48:50+00:00Added an answer on June 15, 2026 at 5:48 am

OK. I must have been really stupid for not checking before. The actual problem was that all of my data nodes were not really running. I reformatted the namenode and everything fell back into place, I was getting a utilization of 15-20% which is not bad for WC. I will run it for the TestDFSIO and see if I could utilize the Disk even more.

0

Reply
Share
Share

- Report

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

I was looking for a Disk intensive Hadoop application to test the I/O activity

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply