What kind of file formats can be read using PIG? How can I store

Question

0

Asked: May 28, 20262026-05-28T19:04:56+00:00 2026-05-28T19:04:56+00:00

What kind of file formats can be read using PIG? How can I store

0

What kind of file formats can be read using PIG?

How can I store them in different formats? Say we have CSV file and I want to store it as MXL file how this can be done? Whenever we use STORE command it makes directory and it stores file as part-m-00000 how can I change name of the file and overwrite directory?

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-05-28T19:04:57+00:00

what kind of file formats can be read using PIG? how can i store them in different formats?

There are a few built-in loading and storing methods, but they are limited:

BinStorage – “binary” storage
PigStorage – loads and stores data that is delimited by something (such as tab or comma)
TextLoader – loads data line by line (i.e., delimited by the newline character)

piggybank is a library of community contributed user-defined functions and it has a number of loading and storing methods, which includes an XML loader, but not a XML storer.

say we have CSV file n i want to store it as MXL file how this can be done?

I assume you mean XML here… Storing in XML is something that is a bit rough in Hadoop because it splits files on a reducer basis, so how do you know where to put the root tag? this likely should be some sort of post-processing to produce wellformed XML.

One thing you can do is to write a UDF that converts your columns into an XML string:

B = FOREACH A GENERATE customudfs.DataToXML(col1, col2, col3);

For example, say col1, col2, col3 are "foo", 37, "lemons", respectively. Your UDF can output the string "<item><name>Foo</name><num>37</num><fruit>lemons</fruit></item>".

whenever we use STORE command it makes directory and it stores file as part-m-00000 how can i change name of the file and overwrite directory?

You can’t change the name of the output file to be something other than part-m-00000. That’s just how Hadoop works. If you want to change the name of it, you should do something to it after the fact with something like hadoop fs -mv output/part-m-00000 newoutput/myoutputfile. This could be done with a bash script that runs the pig script then executes this command.

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

What kind of file formats can be read using PIG? How can I store

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply