I have a scenario where I receive binary data in a buffer (from a com port, or a socket, or some other producer, etc.). The data received is interpreted in different ways, typically keying off the message header in the first byte or first few bytes. I am looking for the best class structure to handle and parse data like this. This seems like it would be easy to develop, but for some reason it is not apparent to me.
One option I came up with is using unions in a POD type class. For example:
class Message {
public:
void DoSomething();
int packetId;
union {
struct packetType1 { int A, int B, ... };
struct packetType2 { float M, short N, ... }; // may be different size than packetType1
...
};
};
void Message::DoSomething() {
switch (packetId) {
case 1:
// do something using packetType1
break;
case 2:
// do something using packetType2
break;
}
}
Is it acceptable practice to pass a pointer to such an object into a function that takes a buffer as input? This compiles and appears to work. For example:
Message msg;
recvfrom(sock, (char*) &msg, sizeof(msg), ...);
msg.DoSomething();
One downside to this is the member variables in Message are public. I would rather make them private and provide read only access methods. If the member variables in Message are made private (or protected), does this still work? I think no, but am not sure.
I have considering using inheritance, but the issue is one doesn’t know which derived class is needed until after the data has been received and the header parsed. This example compiles and appears to work, but this seems like a bad approach.
class Message {
virtual void DoSomething();
int packetId;
};
class PacketType1 : public Message {
void DoSomething();
int A, B;
};
class PacketType2 : public Message {
void DoSomething();
float M;
short N;
};
void Message::DoSomething() {
switch (packetId) {
case 1:
((PacketType1 *) this)->DoSomething();
break;
case 2:
((PacketType2 *) this)->DoSomething();
break;
}
}
Usage:
Message* msg = (Message*) new unsigned char [MAX_MESSAGE_SIZE];
recvfrom(sock, (char*) msg, MAX_MESSAGE_SIZE, ...);
msg->DoSomething();
What are the best practices for a scenario like this? Please be gentle, I am not a software guy by trade or schooling, but sometimes out of necessity. This is one of those times. 🙂 Thanks.
EDIT: A couple folks have mentioned endianness. I forgot to mention in the original post that the message source could have the same or different endianness than my system, but this is known a-priori. My intention is for one of the Message class methods to handle this when necessary. For example:
void Message::ByteSwap() {
ByteSwap4(packetId); // helper function that byte swaps a 4-byte word
switch(packetId) {
case 1:
ByteSwap4(A);
ByteSwap4(B);
...
break;
case 2:
ByteSwap4(M);
ByteSwap2(N); // helper function that byte swaps a 2-byte word
...
break;
}
}
For method one, I neglected to mention that I have to use the compiler directive #pragma pack(1) in the .h file that defines the Message class to force members to be byte aligned.
With regard to the source of the messages, I have no control over those systems. What I do have is formal documentation that defines the byte structure of the message being sent.
Thanks to all for your input!
Your second method is completely broken, as you’re getting a sequence of bytes in
recvfromand then interpreting this as a non-POD type, which is undefined and likely to crash, asMessageprobably needs a vtable and whatnot set up for it which is unlikely to be correct unless the data being received was sent by the exact same process on the same machine. Even if the sender is running identical code on an identical machine, it probably won’t work.The first method gets into all kinds of implementation-defined details about padding and alignment of the structs and unions, but may work ok as long as the sender was built with the same compiler targetting the same architecture. Its most likely to be ok if you use the fixed-size types from stdint.h and carefully arrange things so that padding is unlikely to be needed for alignment, but you still have potential issue with endianness if you try to do this between different architectures.
The best way to do this is to bite the bullet and define your message as a byte stream and write code to explicitly convert an object into a bytestream and build a new object from a bytestream. One way that works reasonably well is an inheritance hierarchy with a virtual encode and static decode method:
edit
You can leave out the encode methods if you don’t care about encoding — but if you’re creating code for both ends of the communication, it makes sense to keep the Encode and Decode together so they remain consistent.
The Decode methods are static because they are called before an object of the decoded type exists (before you even know what that type is, even). They create an object based on the message and return that. So continuing a bit, you might have:
Note that we’re making the size, padding, and byte ordering of all the parts of the message explicit here, rather than relying on however the compiler happens to lay things out.