I am working on some homework for Huffman coding. I already have the Huffman algorithm completed, but need to slightly alter it to work with binary files. I have some spent some time reading related problems, and perhaps due to my lack of understanding of data types and binary files, I am still struggling a bit, so hopefully I am not repeating a prior question (I won’t be posting code related to the huffman part of the program).
Here is the key phrase: “You can assume that each symbol, which will be mapped to a codeword, is a 4-byte binary string.”, and what I think I know is that Char represents one byte and unsigned int represents four byte, so I am guessing I should be reading the input four bytes at a time into a unsigned int Buffer and then collect my data for the Huffman part of the program.
int main() {
unsigned int buffer;
fstream input;
input.open("test.txt", ios::in | ios::binary);
while(input) {
input.read(reinterpret_cast<char *>(&buffer), 4);
//if buffer does not exist as unique symbol in collection of data add it
//if buffer exists update statistics of symbol
}
input.close();
}
Does this look like a good way to handle the data? How should I handle the very end of the file if there are only 1,2, or 3 bytes left? So then I am just storing buffer as unsigned int in a struct. Just out of curiosity how would I recast buffer to a string of characters?
Edit: What’s the best way to store the header of a Huffman compressed a file?
Instead of casting a pointer, I would suggest using
unionofintandchar [4]and passing pointer to thechararray as you should be. Don’t know what’s the rest of the logic, so can’t say if the actual handling (which is not in the code you posted) is done in a good way, but it seems to me rather trivial.Assuming each symbol is 4 bytes long, I would expect that not be a valid input.
Why would you do that? In your data, a “character” is 4 bytes. But you can just use casting to array of bytes if you want (or, better, use bitwise operations to extract the actual bytes, if the order matters).