I’ve got a TCP server written in C# that processes POST data sent to it. Currently it works fine unless a large amount of data (i.e. greater than 1GB) is sent to it then it runs out of memory (I store it all in memory as an array of bytes (with a intermediary of a List DTO)). For large files now I stream down to disk and then pass the filename around with the intention of streaming it from disk.
Currently all of my routines are written to expect byte arrays which, in hindsight, was a little short-sighted. If I just convert the bytearray to a memorystream will it double the memory usage? I think re-writing my code to work on a memorystream will allow me to re-use it when I’m reading a stream from disk?
Sorry for the stupid questions, I’m never sure when c# takes a copy of the data or when it takes a reference.
If you pass a
byte[]into aMemoryStream, then it will copy the data initially (in the constructor), but as long as you release thebyte[]it can be garbage collected. Inherently there is no “doubling” (especially if you can set the size correctly to start with, and write directly to theStreamrather than thebyte[]).I would entirely say switch to
Stream(but only useStreamin the API – nothing moer specific; your consuming code doesn’t need to know which type). Most importantly, you can choose to use theNetworkStream(to read directly from the socket) orFileStream(if you want to buffer to disk), orMemoryStreamif you want to buffer in-process. You will also need to make sure you read that volume of data via stream-based code. Iterator blocks (yield return) can be very helpful here, as can the LINQEnumerablemethods (except forOrderBy,GroupBy, etc, which buffer).Neither passing a
byte[]nor passing aStreamcauses anything to get copied, as they are reference-types – the only thing copied is the reference (4 or 8 bytes, depending on x86/x64).