I have a question which is confusing me and my task is to work out fragmentation.
stat() for a file:
st_size = 10520
st_blksize = 4096
st_blocks = 24
I have read in some places that st_blksize is the general block size of the file system which in this case is 4096 but that file would fit into 3 blocks, 10520 / 512 is 20.5 meaning that there are 3.5 blocks of unused space, even though it is allocated. Does this mean that there are 1792 unused bytes in this file (fragmentation)?
As I have mentioned I read into this a fair bit and have read a lot of contradicting texts, would like someone to clear this up once and for all!
I don’t think your project is really solvable at the
stat(2)API layer. Consider the case of a file 4096 bytes long. Presume it was created by iteratively appending 512 byte blocks over and over again. Presume that the filesystem was completely full, except for one 512 byte block, for each and every write. Presume that the 512 byte block available for each write was located in a randomly available spot on the disk.This file is 100% fragmented — no two blocks are near each other.
And yet, a measure based solely on the
stat(2)variables might well show that there are no wasted blocks anywhere in the file.When trying to track down an answer to your actual question, I got as far as
ext3_write_begin()before being called away — hope this is a useful starting point for your search.Update
If you’re interested in finding fragmentation, I think the place to start is the
bmapcommand from thedebugfs(8)program:This shows the first ten blocks for the file
sars_first_radio_show.zip; you can see that the blocks aren’t all contiguous: 944417{52,81,82,83,84}, 94459905, 951260{19,20,21,22}.You could either script an answer around
debugfs(8)or you could use thelibext2fslibrary routines yourself. It would be a significant step up in complexity compared to thestat(2)exercises you were going through — but the answers would mean something, rather than just be a vague guess.