I’m trying to set up a trial cassandra + pig cluster. The cassandra wiki

Question

0

Asked: May 28, 20262026-05-28T01:24:20+00:00 2026-05-28T01:24:20+00:00

I’m trying to set up a trial cassandra + pig cluster. The cassandra wiki

0

I’m trying to set up a trial cassandra + pig cluster. The cassandra wiki makes it sound like you need hadoop to integrate with pig.

but the readme in cassandra-src/contrib/pig makes it sound like you can run pig on cassandra without hadoop.

If hadoop is optional, what do you lose by not using it?

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-05-28T01:24:21+00:00

Hadoop is only optional when you are testing things out. In order to do anything at any scale you will need hadoop as well.

Running without hadoop means you are running pig in local mode. Which basically means all the data is processed by the same pig process that you are running in. This works fine with a single node and example data.

When running with any significant amount of data or multiple machines you want to run pig in hadoop mode. By running hadoop task trackers on your cassandra nodes pig can take advantage of the benefits map reduce provides by distributing the workload and using data locality to reduce network transfer.

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

I’m trying to set up a trial cassandra + pig cluster. The cassandra wiki

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply