When working with TeraBytes of data, and for a typical data filtering problem, is

Question

0

Asked: June 11, 20262026-06-11T23:50:26+00:00 2026-06-11T23:50:26+00:00

When working with TeraBytes of data, and for a typical data filtering problem, is

0

When working with TeraBytes of data, and for a typical data filtering problem, is Apache PIG the right choice? Or is it better to have a custom MapReduce code doing the job.

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-06-11T23:50:27+00:00

Apache PIG does not serve as a storage layer. PIG is a scripting language that simplifies creation of the code that can run on Hadoop. PIG script is compiled into a set of Hadoop MapReduce jobs that are submitted to the Hadoop and which run in the same way as any other MapReduce Job.

Hadoop does the data storage and not PIG.

To answer your question: No, there are no limitations on the size of the input data. As long as the input data can be parsed by PIG load functions and it is splittable by the Hadoop InputFormats.

PIG scripts are easier and faster to write than standard Java Hadoop jobs and PIG has lot of clever optimizations like multiquery execution, which can make your complex queries execute quicker.

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

When working with TeraBytes of data, and for a typical data filtering problem, is

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply