I want to choose a encoding scheme for data storage. I have very low available memory. which coding should be best to optimally utilize available space.
ANSI, UTF or any other..
Data is the Capital Alphabetics
Sign Up to our social questions and Answers Engine to ask questions, answer people’s questions, and connect with other people.
Login to our social questions & Answers Engine to ask questions answer people’s questions & connect with other people.
Lost your password? Please enter your email address. You will receive a link and will create a new password via email.
Please briefly explain why you feel this question should be reported.
Please briefly explain why you feel this answer should be reported.
Please briefly explain why you feel this user should be reported.
If you know the frequency distribution of letters, Huffman Coding is a good balance between complexity, speed and efficiency.
If you don’t know the distribution of letters or they are random, just store them 5 bits at a time. For example, consider the string “ABCDE”. The letter numbers are 0, 1, 2, 3, 4. Converted to binary, this is:
Now you just group every 8 bits into bytes:
You need to store the length too, so that you know that there is no useful data in the last byte’s 7 bits.
If code space is of no concern and you just want to pack the strings as well as you can, you could use Huffman coding or Arithmetic coding even with a uniform frequency distribution to pack each character into log2(26) bits on average, which is slightly less than 5 (namely, 4.7 bits).