I was doing some experiments to learn more about Linux process states.
So, there’s a directory(named big_dir) with over a billion files in it(the directory has many sub-directories recursively), and then I run tar -cv big_dir | ssh anotherServer "tar -xv -C big_dir" and found out via executing top that, the tar process stays in D status. Meanwhile, the tar command keeps outputting the paths of the files.
I know that, the process was in D status because it was doing disk I/O, but why didn’t its status keep switching between D and R? Printing the file names under the directory must have used some CPU computation, isn’t it? Otherwise how could the find command know that it should print something?
If I run dd if=/dev/zero of=/dev/null, then the dd process status kept in R status from the top output. But why wasn’t it in D status? Wasn’t it doing I/O all the time?
/dev/zeroand/dev/nullare pseudo-devices. So there’s no physical device behind them.If I do
then
topdoes show meddin the D status. However it does spend a lot of it’s time in R (in CPU time).topwill simply sample the process table and consequently you may need to watch it for some time in order to see transient states.I suspect for your tar example above that the amount of time outputting to stdout is negligible compared to the disk time. Note also that outputting to stdout will also involve the windowing system writing and whilst it’s doing that the process will be sleeping. e.g. I’m running
yesright now, and the majority of the work is being performed by my X server. Theyesprocess is sleeping for most of the time I’m watching it (viatop)