In my Java application, I am archiving TIBCO RV messages to a file as bytes.
I am writing a small utility app that will play the messages back. This way I can just create a TibrvMsg object from the bytes without having to parse the file and construct the object manually.
The problem I am having is that I am reading a file that was created on a Linux box, and attempting to run my app on a Windows machine. I get an error due to the different charset the file was written in.
So now, what I want to do is log each message in a specific charset (UTF-8), so that I don’t care what platform I run my playback app in. The app should just read in the file knowing before-hand the charset the file is written in. I am planning on using java.nio packages for this, to transform the bytes from one charset to another.
Do I need to know what charset the TIBRV message bytes are encoded in to do the transformation? If so, how can I find this out?
You are taking opaque data and, it would appear, attempting to write it to a file as textual data without escaping the non textual portions of it (alternatively you are writing it as raw bytes and then trying to read it as if it were character based which is much the same problem).
This is flawed from the very start.
Opaque data should be treated as meaningless and simply stored without modification to give back to an API that does know how to deal with it. If the data must be stored in a textual form then you must losslessly convert the bytes into text. Appropriate encodings are things like base64. Encoding in the sense of character set encoding is NOT lossless if you apply it to raw binary data.
Simply storing the bytes in a file as bytes (not characters) along with a fixed length prefix indicating the length of the message and the subject it was sent on is sufficient to replay RV messages through the system.
In relation to any text based fields inside the message if the encoding matters (I strongly suggest avoiding this mattering in general when designing the app) then you have the same problem on replay as you would have had at the original receipt time which is to convert from the source encoding to the desired encoding (hopefully using exactly the same code) so this should be a non issue in relation to the replaying.