I am trying to read a file which has lines in the following format.

Question

0

Editorial Team

Asked: June 17, 20262026-06-17T15:18:54+00:00 2026-06-17T15:18:54+00:00

I am trying to read a file which has lines in the following format.

0

I am trying to read a file which has lines in the following format.

100,1:2:3
200,10:20:30

Assuming that the inputs will always be numbers, I am trying to read the file by setting the input key and value as IntWritable and Text respectively. But when I run it, I get the following error:

java.lang.ClassCastException: org.apache.hadoop.io.Text cannot be cast to org.apache.hadoop.io.IntWritable

Now, though I understand what it means, I am unable to figure out how to read the key as an integer. The code runs fine if I read the key as a Text as well. I have checked everywhere in the code if I have missed the configuration, but it seems fine to me.

conf.set("mapred.textoutputformat.separator", "|");

conf.setInputFormatClass(KeyValueTextInputFormat.class);
conf.setOutputFormatClass(TextOutputFormat.class);

conf.setOutputKeyClass(IntWritable.class);
conf.setOutputValueClass(Text.class);

I have also checked the mapper class and methods (There is no reducer). Is it that the KeyValueTextInputFormat can read the key as only Text? I am unable to understand what I am doing wrong. Any help would be deeply appreciated.

Thanks,
EG

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-06-17T15:18:56+00:00

Looking at the source of KeyValueTextInputFormat, it extends from FileInputFormat<Text, Text>. What that means is that both key and value for your input are expected to be Text.

You could fix that implementing your own RecordReader which you could model after the KeyValueLineRecordReder described here, but extend from RecordReader<IntWritable, Text> instead and modify the code accordingly.

When you have your RecordReader, you can create your own InputFormat and use your new RecordReader and then in your main code you just need to set your new InputFormat like this:

conf.setInputFormatClass(KeyValueMyInputFormat.class);

Another approach I would recommend if you’re really worried about performance is that you could use SequenceFileInputFormat. This involves storing your input as SequenceFiles, which means it will be in binary format directly. This avoids the overhead of parsing every line as you need to do in your case. You can use this format like this:

conf.setInputFormatClass(SequenceFileInputFormat.class);

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

I am trying to read a file which has lines in the following format.

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply