I had a requirement of encoding a 3 character string(always alphabets) into a 2 byte[] array of 2 integers.
This was to be done to save space and performance reasons.
Now the requirement has changed a bit. The String will be of variable length. It will either be of length 3 (as it is above) or will be of length 4 and will have 1 special character at beginning. The special character is fixed i.e. if we choose @ it will always be @ and always at the beginning. So we are sure that if length of String is 3, it will have only alphabets and if length is 4, the first character will always be ‘@’ followed by 3 alphabets
So I can use
charsAsNumbers[0] = (byte) (locationChars[0] - '@');
instead of
charsAsNumbers[0] = (byte) (chars[0] - 'A');
Can I still encode the 3 or 4 chars to 2 byte array and decode them back? If so, how?
Yes, it is possible to encode an extra bit of information while maintaining the previous encoding for 3 character values. But since your original encoding doesn’t leave nice clean swaths of free numbers in the output set, mapping of the additional set of Strings introduced by adding that extra character cannot help but be a little discontinuous.
Accordingly, I think it would be hard to come up with mapping functions that handle these discontinuities without being both awkward and slow. I conclude that a table-based mapping is the only sane solution.
I was too lazy to re-engineer your mapping code, so I incorporated it into the table initialization code of mine; this also eliminates many opportunities for translation errors 🙂 Your
encode()method is what I callOldEncoder.encode().I’ve run a small test program to verify that
NewEncoder.encode()comes up with the same values asOldEncoder.encode(), and is in addition able to encode Strings with a leading 4th character.NewEncoder.encode()doesn’t care what the character is, it goes by String length; fordecode(), the character used can be defined usingPREFIX_CHAR. I’ve also eyeball checked that the byte array values for prefixed Strings don’t duplicate any of those for non-prefixed Strings; and finally, that encoded prefixed Strings can indeed be converted back to the same prefixed Strings.I’ve left a few intricate constant expressions in the code, especially the powers-of-26 stuff. The code looks horribly mysterious otherwise. You can leave those as they are without losing performance, as the compiler folds them up like Kleenexes.
Update:
As the horror of X-mas approaches, I’ll be on the road for a while. I hope you’ll find this answer and code in time to make good use of it. In support of which effort I’ll throw in my little test program. It doesn’t directly check stuff but prints out the results of conversions in all significant ways and allows you to check them by eye and hand. I fiddled with my code (small tweaks once I got the basic idea down) until everything looked OK there. You may want to test more mechanically and exhaustively.