I need to write bytes of an IEnumerable<byte> to a file.
I can convert it to an array and use Write(byte[]) method:
using (var stream = File.Create(path))
stream.Write(bytes.ToArray());
But since IEnumerable doesn’t provide the collection’s item count, using ToArray is not recommended unless it’s absolutely necessary.
So I can just iterate the IEnumerable and use WriteByte(byte) in each iteration:
using (var stream = File.Create(path))
foreach (var b in bytes)
stream.WriteByte(b);
I wonder which one will be faster when writing lots of data.
I guess using Write(byte[]) sets the buffer according to the array size so it would be faster when it comes to arrays.
My question is when I just have an IEnumerable<byte> that has MBs of data, which approach is better? Converting it to an array and call Write(byte[]) or iterating it and call WriteByte(byte) for each?
Enumerating over a large stream of bytes is a process that adds tons of overhead to something that is normally cheap: Copying bytes from one buffer to the next.
Normally, LINQ-style overhead does not matter much but when it comes to processing 100 million bytes per second on a normal hard drive you will notice severe overheads. This is not premature optimization. We can foresee that this will be a performance hotspot so we should eagerly optimize.
So when copying bytes around you probably should not rely on abstractions like
IEnumerableandIListat all. Pass around arrays orArraySegement<byte>‘s which also containOffsetandCount. This frees you from slicing arrays too often.One thing that is a death-sin with high-throughput IO, too, is calling a method per byte. Like reading bytewise and writing bytewise. This kills performance because these methods have to be called hundreds of millions of times per second. I have experienced that myself.
Always process entire buffers of at least 4096 bytes at a time. Depending on what media you are doing IO with you can use much larger buffers (64k, 256k or even megabytes).