AFAK, Hadoop Streaming only support text input, which means the data is organized by lines. but the mapper code will become messy if we want backward compatibility, supporting different versions of log lines in the same mapper program wrote in c++.
I used to consider avro or protobuf, but it seems that they are not supported in streaming mode, is it true?
and is there any other solution?
Other input/output formats can also be used along with Hadoop Streaming.
Avro support had been added for Hadoop Streaming. See AVRO-808 & AVRO-830. Also this Thread might be useful.
I could not find InputFormat and OutputFormat classes for ProtoBuf. So, they need to be custom created.