Share
Sign Up to our social questions and Answers Engine to ask questions, answer people’s questions, and connect with other people.
Login to our social questions & Answers Engine to ask questions answer people’s questions & connect with other people.
Lost your password? Please enter your email address. You will receive a link and will create a new password via email.
Please briefly explain why you feel this question should be reported.
Please briefly explain why you feel this answer should be reported.
Please briefly explain why you feel this user should be reported.
This is dependent on the InputFormat, which for most file-based formats is defined in the
FileInputFormatbase class.There are a number of configurable options which denote how hadoop will take a single file and either process it as a single split, or divide the file into multiple splits:
InputFormat.isSplittable()implementation for your input format for more informationmapred.min.split.sizeandmapred.max.split.sizewhich help the input format when breaking up blocks into splits. Note that the minimum size may be overriden by the input format (which may have a fixed minumum input size)If you want to know more, and are comfortable looking through the source, check out the
getSplits()method inFileInputFormat(both the new and old api have the same method, but they may have some suttle differences).