I’ve several hadoop streaming api programs and produce output with this outputformat: org.apache.hadoop.mapred.SequenceFileOutputFormat And

Question

0

Asked: June 3, 20262026-06-03T09:04:31+00:00 2026-06-03T09:04:31+00:00

I’ve several hadoop streaming api programs and produce output with this outputformat: org.apache.hadoop.mapred.SequenceFileOutputFormat And

0

I’ve several hadoop streaming api programs and produce output with this outputformat:
“org.apache.hadoop.mapred.SequenceFileOutputFormat”
And the streaming api program can read the file with input format “org.apache.hadoop.mapred.SequenceFileAsTextInputFormat”.

For the data in the output file looks like this.

val1-1,val1-2,val1-3
val2-1,val2-2,val2-3
val3-1,val3-2,val3-3

Now I want to read the output with hive. I created a table with this script:

CREATE EXTERNAL 
TABLE IF NOT EXISTS table1
(
col1 int,
col2 string,
col3 int
)
PARTITIONED BY (year STRING,month STRING,day STRING,hour STRING)
ROW FORMAT DELIMITED
FIELDs TERMINATED BY '\t'
LINES TERMINATED BY '\n'
STORED AS INPUTFORMAT 'org.apache.hadoop.mapred.SequenceFileAsTextInputFormat'
OUTPUTFORMAT 'org.apache.hadoop.mapred.SequenceFileOutputFormat'
LOCATION '/hive/table1';

When I query data with query

select * from table1

The result will be

val1-2,val1-3
val2-2,val2-3
val3-2,val3-3

It seems the first column has been ignored. I think hive just use values as output not keys. Any ideas?

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-06-03T09:04:32+00:00

You are correct. One of the limitations of Hive right now is that ignores the keys from the Sequence file format. By right now, I am referring to Hive 0.7 but I believe it’s a limitation of Hive 0.8 and Hive 0.9 as well.

To circumvent this, you might have to create a new input format for which the key is null and the value is the combination of your present key and value. Sorry, I know this was not the answer you were looking for!

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

I’ve several hadoop streaming api programs and produce output with this outputformat: org.apache.hadoop.mapred.SequenceFileOutputFormat And

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply