Sign Up

Sign Up to our social questions and Answers Engine to ask questions, answer people’s questions, and connect with other people.

Have an account? Sign In

Have an account? Sign In Now

Sign In

Login to our social questions & Answers Engine to ask questions answer people’s questions & connect with other people.

Sign Up Here

Forgot Password?

Don't have account, Sign Up Here

Forgot Password

Lost your password? Please enter your email address. You will receive a link and will create a new password via email.

Have an account? Sign In Now

You must login to ask a question.

Forgot Password?

Need An Account, Sign Up Here

Please briefly explain why you feel this question should be reported.

Please briefly explain why you feel this answer should be reported.

Please briefly explain why you feel this user should be reported.

Sign InSign Up

The Archive Base

The Archive Base Logo The Archive Base Logo

The Archive Base Navigation

  • SEARCH
  • Home
  • About Us
  • Blog
  • Contact Us
Search
Ask A Question

Mobile menu

Close
Ask a Question
  • Home
  • Add group
  • Groups page
  • Feed
  • User Profile
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Buy Points
  • Users
  • Help
  • Buy Theme
  • SEARCH
Home/ Questions/Q 8188989
In Process

The Archive Base Latest Questions

Editorial Team
  • 0
Editorial Team
Asked: June 7, 20262026-06-07T03:05:50+00:00 2026-06-07T03:05:50+00:00

This might seem as a silly question but in Hadoop suppose blocksize is X

  • 0

This might seem as a silly question but in Hadoop suppose blocksize is X (typically 64 or 128 MB) and a local filesize is Y (where Y is less than X).Now when I copy file Y to the HDFS will it consume one block or will hadoop create smaller size blocks?

  • 1 1 Answer
  • 0 Views
  • 0 Followers
  • 0
Share
  • Facebook
  • Report

Leave an answer
Cancel reply

You must login to add an answer.

Forgot Password?

Need An Account, Sign Up Here

1 Answer

  • Voted
  • Oldest
  • Recent
  • Random
  1. Editorial Team
    Editorial Team
    2026-06-07T03:05:52+00:00Added an answer on June 7, 2026 at 3:05 am

    One block is consumed by Hadoop. That does not mean that storage capacity will be consumed in an equivalent manner.

    The output while browsing the HDFS from web looks like this:

    filename1   file    48.11 KB    3   128 MB  2012-04-24 18:36    
    filename2   file    533.24 KB   3   128 MB  2012-04-24 18:36    
    filename3   file    303.65 KB   3   128 MB  2012-04-24 18:37
    

    You see that each file size is lesser than the block size which is 128 MB. These files are in KB.
    HDFS capacity is consumed based on the actual file size but a block is consumed per file.

    There are limited number of blocks available dependent on the capacity of the HDFS. You are wasting blocks as you will run out of them before utilizing all the actual storage capacity. Remember that Unix filsystem also has concept of blocksize but is a very small number around 512 Bytes. This concept is inverted in HDFS where the block size is kept bigger around 64-128 MB.

    The other issue is that when you run map/reduce programs it will try to spawn mapper per block so in this case when you are processing three small files, it may end up spawning three mappers to work on them eventually.
    This wastes resources when the files are of smaller size. You also add latency as each mapper takes time to spawn and then ultimately would work on a very small sized file. You have to compact them into files closer to blocksize to take advantage of mappers working on lesser number of files.

    Yet another issue with numerous small files is that it loads namenode which keeps the mapping (metadata) of each block and chunk mapping in main memory. With smaller files, you fill up this table faster and more main memory will be required as metadata grows.

    Read the following for reference:

    1. http://www.cloudera.com/blog/2009/02/the-small-files-problem/
    2. http://www.ibm.com/developerworks/web/library/wa-introhdfs/
    3. Oh! there is a discussion on SO : Small files and HDFS blocks
    • 0
    • Reply
    • Share
      Share
      • Share on Facebook
      • Share on Twitter
      • Share on LinkedIn
      • Share on WhatsApp
      • Report

Sidebar

Related Questions

This might seem as silly question but I am thinking there might be the
I know this question might be little silly but I can't seem to find
This might seem like a silly question but valgrind doesn't by default give you
This might seem to be a silly question at first, but please read on.
This question might seem really silly to most of the enlightened folks here. But
This might seem like a silly question, but after asking some questions on stackoverflow
This question might seem silly, but what's the difference between accessing an element (with
This might be a silly question, but I can't seem to find the answer
This might seem like a silly question, but I downloaded the Reactive Extensions for
This is might seem to be a sort of silly question to ask but

Explore

  • Home
  • Add group
  • Groups page
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Users
  • Help
  • SEARCH

Footer

© 2021 The Archive Base. All Rights Reserved
With Love by The Archive Base

Insert/edit link

Enter the destination URL

Or link to existing content

    No search term specified. Showing recent items. Search or use up and down arrow keys to select an item.