I’m no expert on the formats, but I’m guessing it’s possible for certain input data for the compressed data to actually be longer, due to formatting overheads.
I’m OK with this but what I’m not OK with is the documented behaviour of the count parameter to GZipStream/DeflateStream.Write(): “The maximum number of compressed bytes to write.” The usual practice (unless compressing in chunks) is to pass in the length of the input data:
public static byte[] Compress(byte[] data)
{
using (var compressed = new IO.MemoryStream(data.Length))
{
using (var compressor = new IO.Compression.DeflateStream(compressed, IO.Compression.CompressionMode.Compress))
compressor.Write(data, 0, data.Length);
return compressed.ToArray();
}
}
In the edge case I’m talking about, the write statement won’t write out the whole compressed data stream, just the first data.Length bytes of it. I could just double the buffer size but for large data sets that’s a bit wasteful, and anyway I don’t like the guesswork.
Is there a better way to do this?
I am pretty sure that it is a mistake in the documentation. Documentation in earlier versions reads "The number of bytes compressed.", which is consistent with how all other streams work.
The same change was made to the documentation of the
Readmethod, where it makes sense, but I think that the change was made by mistake to the documentation of theWritemethod. Someone corrected the documentation of theReadmethod, and thought that the same correction would apply to theWritemethod also.The normal behavior for the
Readmethod of a stream is that it can return less data than requested, and the method returns the number of bytes actually placed in the buffer. TheWritemethod on the other hand always writes all the data specified. It wouldn’t make any sense for the method to write less data in any implementation. As the method doesn’t have a return value, it could not return the number of bytes written.The count specified is not the size of the output, it’s the size of the data that you send into the method. If the output is larger than the input, it will still all be written to the stream.
Edit:
I added a comment about this to the community content of the documentation of the method in MSDN Library. Let’s see if Microsoft follows up on that…