Suppose I have
String input = "1,2,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,2,2,3,0,4,0,0,0,4,0,3";
I want to encode it into a string with less character and actually hides the actual information by representing it in roman character, IE. the above encodes to something like "Adqwqkjlhs". Must be able to decode to original string if given the encoded string.
The string input is actually something I parse from the hash of an URL, but the original format is lengthy and open to manipulation.
Any ideas?
Thanks
Edit #1
The number can be from 0 to 99, and each number is separate by a comma for String.split(“,”) to retrieve the String[]
Edit #2 (Purpose of encoded string)
Suppose the above string encodes to bmtwva1131gpefvb1xv, then I can have URL link like www.shortstring.com/input#bmtwva1131gpefvb1xv. From there I would decode bmtwva1131gpefvb1xv into comma separate numbers.
Suggest you look at base64 which provides 6 bits of information per character — in general your encoding efficiency is log2(K) bits per symbol where K is the number of symbols in the set of allowable symbols.
For 8-bit character set, many of these are impermissible in URLs, so you need to choose some subset that are legal URL characters.
Just to clarify: I didn’t mean encode your “1,2,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,2,2,3,0,4,0,0,0,4,0,3” string as base64 — I meant figure out what information you really want to encode, expressed as a string of raw binary bytes, and encode that in base64. It will exclude control characters (although you might want to use an alternate form where all 64 characters can be used in URLs without escaping) and be more efficient than converting numbers to a printable number form.
OK, now you have a clear definition. Here’s a suggestion:
Convert your information from its original form to a binary number / byte array. If all you have is a string of comma-separated numbers from 0-99, then here’s two options:
(slow) — treat as numbers in base 100, convert to a BigInteger (e.g. n = n * 100 + x[i] for each number x in the array), convert to a byte array, and be sure to precede the whole thing by its length, so that “0,0,0,0” can be distinguished from “0,0” (numerically equal in base 100 but it has a different length. Then convert the result to base64.
(more efficient) — treat as numbers in base 128 (since that is a power of 2), and use any number from 100-127 as a termination character. Each block of 6 numbers therefore contains 42 (=6*7) bits of information, which can be encoded as a string of 7 characters using base64. (Pad with termination characters as needed to reach an even multiple of 6 of the original numbers.)
Because you have a potentially variable-length array of numbers as inputs, you need to encode the length somehow — either directly as a prefix, or indirectly by using a termination character.
For the inverse algorithm, just reverse the steps and you’ll get an array of numbers from 0 to 99 — using either the prefixed length or termination character to determine the size of the array — which you can convert to a human-readable string separated with commas.
If you have access to the original information in a raw binary form before it’s encoded as a string, use that instead. (but please post a question with the input format requirements for that information)