I have an API that receives Unicode data, but I only need to store

Question

0

Asked: May 24, 20262026-05-24T07:05:16+00:00 2026-05-24T07:05:16+00:00

I have an API that receives Unicode data, but I only need to store

0

I have an API that receives Unicode data, but I only need to store ASCII in it. I’d like to compress & obfuscate (or encrypt) the string values that will be persisted in Unicode.

My desire is to either compress this schema data, or to encrypt it from prying eyes. I don’t think it’s possible to do both well.

Considering that I want to restrict my source data to valid, printable ASCII; how can I “compress” that original string value into a value that is either smaller, obfuscated, or both?

Here is how I imagine this working (though you may have a better way):

This source code will take a given String as input
The bytes representation of that string will be taken (UTF8, ASCII, you decide)
Some magic happens – (this is the part I need your help on)
The resulting bytes will be converted into an int or long (no decimal points)
The number will be converted into a corresponding character using this utility
http://baseanythingconvert.codeplex.com/SourceControl/changeset/view/77855#1558651

(note that utility will be used to enforce the constraint is that the “final” Unicode name must not include the following characters ‘/’, ‘\’, ‘#’, ‘?’ or ‘%’)

Background

The Microsoft Azure Table has an API that accepts Unicode data for the storage or property names. This is a schema-free database (so columns can be created ad-hoc), therefore the schema is stored per row. The downside is that this schema data is stored on disk multiple times, and it is also transmitted over the wire, quite redundantly, in an XML blob.

In addition, I’m working on a utility that dynamically encrypts/decrypts Azure Table Data, but the schema is unencrypted. I’d like to mask or obfuscate this header information somehow.

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-05-24T07:05:18+00:00

These are just some ideas.

Isn’t step 3 actually straightforward (just compress and/or encrypt the data into different bytes)? For 7-bit ASCII, you can also, before compressing and/or encrypting, store the data by packing the bits so they fit into fewer bytes.

If you can use UTF-32, UTF-8, and so on in step 5, you have access to all the characters in the Unicode Standard, up to 0x10FFFD, with some exceptions; for example, some code points are noncharacters in the Unicode Standard, such as 0xFFFF, and others are invalid characters, such as 0xD800.

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

I have an API that receives Unicode data, but I only need to store

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply