I need the SQL equivalent of an AUTO_INCREMENT id in hadoop. When my reduce

Question

0

Editorial Team

Asked: June 13, 20262026-06-13T09:56:51+00:00 2026-06-13T09:56:51+00:00

I need the SQL equivalent of an AUTO_INCREMENT id in hadoop. When my reduce

0

I need the SQL equivalent of an AUTO_INCREMENT id in hadoop.

When my reduce task identifies a new item, those items needs a unique ID assigned.

How can I share an atomic counter across the cluster? The reporter
counters seem to be just increment counters, there’s no
getAndIncrement feature that I see.
How can I set that counter before the map/reduce phase of the job
starts?

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-06-13T09:56:52+00:00

To perform distributed id generation you can either just generate uuids or use functionality found in Apache Zookeeper, which can do distributed coordination on Hadoop clusters. Disclaimer: I have never used Zookeeper, so I don’t know if you can really (even theoretically) get a global contiguous set of ids, which is what the question seems to be asking.

Generating UUIDs does have a cost, though; they take some time to generate.

For good general information on distributed ID generation, see this Stack Overflow question.

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

I need the SQL equivalent of an AUTO_INCREMENT id in hadoop. When my reduce

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply