I want to run a hadoop unit test, using the local filesystem mode… I

Question

0

Asked: May 31, 20262026-05-31T19:07:26+00:00 2026-05-31T19:07:26+00:00

I want to run a hadoop unit test, using the local filesystem mode… I

0

I want to run a hadoop unit test, using the local filesystem mode… I would ideally like to see several part-m-* files written out to disk (rather than just 1). However, since it just a test, I dont want to process 64M of data (the default size is ~64megs per block, i believe).

In distributed mode we can set this using

dfs.block.size

I am wondering wether there a way that i can get my local file system to write small part-m files out, i.e. so that my unit test will mimic the contents of large scale data with several (albeit very small) files.

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-05-31T19:07:27+00:00

Assuming your input format can handle splitable files (see the org.apache.hadoop.mapreduce.lib.input.FileInputFormat.isSplitable(JobContext, Path) method), you can amend the input split size to process a smaller file with multi mappers (i’m going to assume you’re using the new API mapreduce package):

For example, if you’re using the TextInputFormat (or most input formats that extend FileInputFormat), you can call the static util methods:

FileInputFormat.setMaxInputSplitSize(Job, long)
FileInputFormat.setMinInputSplitSize(Job, long)

The long argument is the size of the split in bytes, so just set to you’re desired size

Under the hood, these methods set the following job configuration properties:

mapred.min.split.size
mapred.max.split.size

Final note, some input formats may override the FileInputFormat.getFormatMinSplitSize() method (which defaults to 1 byte for FileInputFormat), so be weay if you set a value and hadoop is appearing to ignore it.

A final point – have you considered MRUnit http://incubator.apache.org/mrunit/ for actual ‘unit’ testing of your MR code?

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

I want to run a hadoop unit test, using the local filesystem mode… I

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply