hadoop writes in a SequenceFile in in key-value pair(record) format. Consider we have a

Question

0

Asked: May 27, 20262026-05-27T10:12:55+00:00 2026-05-27T10:12:55+00:00

hadoop writes in a SequenceFile in in key-value pair(record) format. Consider we have a

0

hadoop writes in a SequenceFile in in key-value pair(record) format. Consider we have a large unbounded log file. Hadoop will split the file based on block size and save them on multiple data nodes. Is it guaranteed that each key-value pair will reside on a single block? or we may have a case so that key is in one block on node 1 and value(or parts of it) on second block on node 2? If we may have unmeaning-full splits, then what is the solution? sync markers?

Another question is: Does hadoop automatically write sync markers or we should write it manually?

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-05-27T10:12:56+00:00

I asked this question in hadoop mailing list. They answered:

Sync markers are written into sequence files already, they are part of
the format. This is nothing to worry about – and is simple enough to
test and be confident about. The mechanism is same as reading a text
file with newlines – the reader will ensure reading off the boundary
data in order to complete a record if it has to.

then I asked:

So if we have a map job analysing only the second block of the log
file, it should not transfer any other parts of that from other nodes
because that part is stand alone and meaning full split? Am I right?

They answered:

Yes. Simply put, your records shall never break. We do not read just
at the split boundaries, we may extend beyond boundaries until a sync
marker is encountered in order to complete a record or series of
records. The subsequent mappers will always skip until their first
sync marker, and then begin reading – to avoid duplication. This is
exactly how text file reading works as well — only here, it is
newlines.

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

hadoop writes in a SequenceFile in in key-value pair(record) format. Consider we have a

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply