I read proc/<pid>/io to measure the IO-activity of SQL-queries, where <pid> is the PID of the database server. I read the values before and after each query to compute the difference and get the number of bytes the request caused to be read and/or written.
As far as I know the field READ_BYTES counts actual disk-IO, while RCHAR includes more, like reads that could be satisfied by the linux page cache (see Understanding the counters in /proc/[pid]/io for clarification).
This leads to the assumption, that RCHAR should come up with a value equal or greater than READ_BYTES, but my results contradict this assumption.
I could imagine some minor block or page overhead for results I get for Infobright ICE (values are MB):
Query RCHAR READ_BYTES
tpch_q01.sql| 34.44180| 34.89453|
tpch_q02.sql| 2.89191| 3.64453|
tpch_q03.sql| 32.58994| 33.19531|
tpch_q04.sql| 17.78325| 18.27344|
But I completely fail to understand the IO-counters for MonetDB (values are MB):
Query RCHAR READ_BYTES
tpch_q01.sql| 0.07501| 220.58203|
tpch_q02.sql| 1.37840| 18.16016|
tpch_q03.sql| 0.08272| 162.38281|
tpch_q04.sql| 0.06604| 83.25391|
Am I wrong with the assumption that RCHAR includes READ_BYTES? Is there a way to trick out the kernels counters, that MonetDB could use? What is going on here?
I might add, that I clear the page cache and restart the database-server before each query.
I’m on Ubuntu 11.10, running kernel 3.0.0-15-generic.
I can only think of two things:
http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=blob;f=Documentation/filesystems/proc.txt;hb=HEAD#l1305
1:
I read “Caused to be fetched from the storage layer” to include readahead, whatever.
2:
Note that this says nothing about “disk access via memory mapped files”. I think this is the more likely reason, and that your MonetDB probably mmaps out its database files and then does everything on them.
I’m not really sure how you could check the used bandwidth on mmap, because of its nature.