I have a directory OUTPUT where I have the output files from a Map

Question

0

Asked: June 5, 20262026-06-05T18:01:53+00:00 2026-06-05T18:01:53+00:00

I have a directory OUTPUT where I have the output files from a Map

0

I have a directory OUTPUT where I have the output files from a Map Reduce job. The output files are Text files written with a TextOutputFormat.

Now I want to read the key value pairs from the output file. How can I do so using some existing classes in hadoop. One way I could do it was as follows

FileSystem fs = FileSystem.get(conf);
FileStatus[] files = fs.globStatus(new Path(OUTPUT + "/part-*"));
for(FileStatus file:files){
  if(file.getLen() > 0){
    FSDataInputStream in = fs.open(file.getPath());
    BufferedReader bin = new BufferedReader(new InputStreamReader(
        in));
    String s = bin.readLine();
    while(s!=null){
      System.out.println(s);
      s = bin.readLine();
    }
    in.close();
  }
}

This approach would work but increases my task to a great deal as I now need to manually parse the key value pairs out of each individual line. I am looking for something more handy that directly lets me read key and value in some variables.

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-06-05T18:01:56+00:00

Are you forced to use TextOutputFormat as your output format in the previous job?

If not then consider using SequenceFileOutputFormat, then you can use a SequenceFile.Reader to read back the file in Key / Value pairs. You can also still ‘view’ the file using hadoop fs -text path/to/output/part-r-00000

EDIT: You can also use the KeyValueLineRecordReader class, you’ll just need to pass in a FileSplit to teh constructor.

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

I have a directory OUTPUT where I have the output files from a Map

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply