Hadoop/Hive newbie here. I am trying to use data stored in a custom text-based format with Hive. My understanding is you can either write a custom FileFormat or a custom SerDe class to do that. Is that the case or am I misunderstanding it? And what are some general guidelines on which option to choose when? Thanks!
Hadoop/Hive newbie here. I am trying to use data stored in a custom text-based
Share
I figured it out. I did not have to write a serde after all, wrote a custom InputFormat (extends
org.apache.hadoop.mapred.TextInputFormat) which returns a custom RecordReader (implementsorg.apache.hadoop.mapred.RecordReader<K, V>). The RecordReader implements logic to read and parse my files and returns tab delimited rows.With that I declared my table as
This uses a native SerDe. Also, it is required to specify an output format when using a custom input format, so I choose one of the built-in output formats.