Hadoop/Hive newbie here. I am trying to use data stored in a custom text-based

Question

0

Asked: May 26, 20262026-05-26T13:41:49+00:00 2026-05-26T13:41:49+00:00

Hadoop/Hive newbie here. I am trying to use data stored in a custom text-based

0

Hadoop/Hive newbie here. I am trying to use data stored in a custom text-based format with Hive. My understanding is you can either write a custom FileFormat or a custom SerDe class to do that. Is that the case or am I misunderstanding it? And what are some general guidelines on which option to choose when? Thanks!

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-05-26T13:41:50+00:00

I figured it out. I did not have to write a serde after all, wrote a custom InputFormat (extends org.apache.hadoop.mapred.TextInputFormat) which returns a custom RecordReader (implements org.apache.hadoop.mapred.RecordReader<K, V>). The RecordReader implements logic to read and parse my files and returns tab delimited rows.

With that I declared my table as

create table t2 ( 
field1 string, 
..
fieldNN float)        
ROW FORMAT DELIMITED
FIELDS TERMINATED BY '\t'    
STORED AS INPUTFORMAT 'namespace.CustomFileInputFormat'
OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat';

This uses a native SerDe. Also, it is required to specify an output format when using a custom input format, so I choose one of the built-in output formats.

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

Hadoop/Hive newbie here. I am trying to use data stored in a custom text-based

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply