I was reading the erlang documentation about file io and saw this:
On operating systems with thread support, it is possible to let file
operations be performed in threads of their own, allowing other Erlang
processes to continue executing in parallel with the file operations.
See the command line flag +A in erl(1).
so what I expected was that the time required for an IO operation would be reduced if I added asynchronous threads.
instead, when I tried running erl +A1,erl +A6 or erl +A12 (on a 6-core machine) the time required to write in a file increased 5-10 times.
I used timer:tc/3 to measure time and I used io:write/2, file:write/2 (converted the term to binary) and file:write/2 while opening the file with the raw flag. the term was ~170kb in size and was written 1000 times. Used R14B04 (but I got similar results with R15A too).
Am I doing something wrong, either in utilizing the asynchronous IO or in measuring its efficiency?
Could it be that the overhead introduced by passing the term (perhaps because its size is small) out-weights the speedup gained?
the (not so elegant :$) code:
-module(test).
-compile(export_all).
test()->
{ok,F}=file:open(foo,[raw,write]), % or just [write]
{T,ok}=timer:tc(test,t,[F,1000]),
file:close(F),
T.
t(_,0)->ok;
t(F,A)->
B=dsafasfagafssadagfsdsaasdfdsafasfagafssadagfsdsaasdfdsafasfagafssadagfsdsaasdfdsafasfagafssadagfsdsaasdfdsafasfagafssadagfsdsaasdfagafssadagfsdsaasdfdsafasfagafssadagfsdsaasdfdsafasfagafssadagfsdsaasdfdsafasfagafssadagfsdsaasdf,
file:write(F,
term_to_binary([B,B,B,B,B,B,B,B,B,B,B,B,B,B,B,B,B,B,B,B,B,B,B,B,B,B,B,B,B,B,B,B,B,B,B,B,B,B,B,B,B,B,B,B,B,B,B,B,B,B,B,B,B,B,B,B,B,B,B,B,B,B,B,B,B,B,B,B,B,B,B,B,B,B,B,B,B,B,B,B,B,B,B,B,B,B,B,B,B,B,B,B,B,B,B,B,B,B,B,B,B,B,B,B,B,B,B,B,B,B,B,B,B,B,B,B,B,B,B,B,B,B,B,B,B,B,B,B,B,B,B,B,B,B,B,B,B,B,B,B,B,B,B,B,B,B,B,B,B,B,B,B,B,B,B,B,B,B,B,B,B,B,B,B,B,B,B,B,B,B,B,B,B,B,B,B,B,B,B,B,B,B,B,B,B,B,B,B,B,B,B,B,B,B,B,B,B,B,B,B,B,B,B,B,B,B,B,B,B,B,B,B,B,B,B,B,B,B,B,B,B,B,B,B,B,B,B,B,B,B,B,B,B,B,B,B,B,B,B,B,B,B,B,B,B,B,B,B,B,B,B,B,B,B,B,B,B,B,B,B,B,B,B,B,B,B,B,B,B,B,B,B,B,B,B,B,B,B,B,B,B,B,B,B,B,B,B,B,B,B,B,B,B,B,B,B,B,B,B,B,B,B,B,B,B,B,B,B,B,B,B,B,B,B,B,B,B,B,B,B,B,B,B,B,B,B,B,B,B,B,B,B,B,B,B,B,B,B,B,B,B,B,B,B,B,B,B,B,B,B,B,B,B,B,B,B,B,B,B,B,B,B,B,B,B,B,B,B,B,B,B,B,B,B,B,B,B,B,B,B,B,B,B,B,B,B,B,B,B,B,B,B,B,B,B,B,B,B,B,B,B,B,B,B,B,B,B,B,B,B,B,B,B,B,B,B,B,B,B,B,B,B,B,B,B,B,B,B,B,B,B,B,B,B,B,B,B,B,B,B,B,B,B,B,B,B,B,B,B,B,B,B,B,B,B,B,B,B,B,B,B,B,B,B,B,B,B,B,B,B,B,B,B,B,B,B,B,B,B,B,B,B,B,B,B,B,B,B,B,B,B,B,B,B,B,B,B,B,B,B,B,B,B,B,B,B,B,B,B,B,B,B,B,B,B,B,B,B,B,B,B,B,B,B,B,B,B,B,B,B,B,B,B,B,B,B,B,B,B,B,B,B,B,B,B,B,B,B,B,B,B,B,B,B,B,B,B,B,B,B,B,B,B,B,B,B,B,B,B,B,B,B,B,B,B,B,B,B,B,B,B,B,B,B,B,B,B,B,B,B,B,B,B,B,B,B,B,B,B,B,B,B,B,B,B,B,B,B,B,B,B,B,B,B,B,B,B,B,B,B,B,B,B,B,B,B,B,B,B,B,B,B,B,B,B,B,B,B,B,B,B,B,B,B,B,B,B,B,B,B,B,B,B,B,B,B,B,B,B,B,B,B,B,B,B,B,B,B,B,B,B,B,B,B,B,B,B,B,B,B,B,B,B,B,B,B,B,B,B,B,B,B,B,B,B,B,B,B,B,B,B,B,B,B,B,B,B,B,B,B,B,B,B,B,B,B,B,B,B,B,B,B,B,B,B,B,B,B,B,B,B,B,B,B,B,B,B,B,B,B,B,B,B,B,B,B,B,B,B,B,B,B,B,B,B,B,B,B,B,B,B,B,B,B,B,B,B,B,B,B,B,B,B,B,B,B,B,B,B,B,B,B])),
%or io:write(F,[B,..])
t(F,A-1).
I am interested in minimizing IO overhead (basically just output) because I want to write some data to disk during profiling; that’s why sending the data to some other process that will do the writing is not helpful (unless I could somehow devote a core to a process). So far, the best method seems to be opening a raw file, accumulating data and then writing them; any tips would be appreciated 🙂
When I run the test I get about the same performance with +A1 and without. This is on OS X Snow Leopard with a dual core processor. This is also what I would expect using async threads.
Adding async threads only adds additional threads to do IO with and thus increase performance of parallel IO jobs and it also allows normal erlang processes to be run at a faster rate as the process threads are not busy doing IO jobs.
If you run a test with many parallel jobs you should see a performance gain from using async threads.
Why you are seeing a performance decrease in your sequential tests is a mystery though.