If I were creating a videogame level editor in AS3 or .NET with a string-based level format, that can be copied, pasted and emailed, how much data could I encode into each character? What is important is getting the maximum amount of data for the minimum amount of characters displayed on the screen, regardless of how many bytes the computer is actually using to store these characters.
For example if I wanted to store the horizontal position of an object in 1 string character, how many possible values could that have? Are there are any characters that can’t be sent over a the internet, or that can’t be copy and pasted? What difference would things like UTF8 make? Answers please for either AS3 or C#/.NET, or both.
2nd update: ok so Flash uses UTF16 for its String class. There are lots of control characters that I cannot use. How could I manage which characters are ok to use? Just a big lookup table? And can operating systems and browser handle UTF16 to the extent that you can safely copy and paste a UTF16 string into an email, notepad, etc?
Updated: “update 1”, “update 2”
You can store 8 Bits in a single charakter with ANSI, ASCII or UTF-8 encoding.
But, for example, if you whant to use ASCII-Encoding you shouldn’t use the first 5 bits (0001 1111 = 0x1F) and the chars 0x7F there are represent system-charaters like “Escape, null, start of text, end of text ..) who are not can be copy and paste. So you could store 223 (1110 0000 = 0xE0) different informations in one single charakter.
If you use UTF-16 you have 2 bytes = 16 bits – system-characters to store your informationen.
see images at the and of this post!
update 1:
If you not need to modify the values without any tool (c#-tool, javascript-base webpage, …) you can alternative base64 or zip+base64 your informationens. this solution avoid the problem that you descript in your 2nd update. “here are lots of control characters that I cannot use. How could I manage which characters are ok to use?”
If this is not an option you can not avoid to use any type of lookup-table.
the shortest way for an lookuptable are:
or you code it like this:
update 2:
for Unicode (UTF-16) you can use this table: http://www.tamasoft.co.jp/en/general-info/unicode.html
Any character represent with a symbol like or are empty you should not use.
So you can not store 50,000 possible values in one utf-16 character if you allow to copy and past them. you need any spezial-encoder and you must use 2 UTF-16 character like:
(source: asciitable.com)