I’m working on a web app that needs to take a list of files on a query string (specifically a GET and not a POST), something like:
http://site.com/app?things=/stuff/things/item123,/stuff/things/item456,/stuff/things/item789
I want to shorten that string:
http://site.com/app?things=somekindofencoding
The string isn’t terribly long, varies from 20-150 chars. Something that short isn’t really suitable for GZip, but it does have an awful lot of repetition so compression should be possible.
I don’t want a DB or Dictionary of strings – the URL will be built by a different application to the one that consumes it. I want a reversible compression that shortens this URL. It doesn’t need to be secure.
Is there an existing way to do this? I’m working in C#/.Net but would be happy to adapt an algorithm from some other language/stack.
If you can express the data in BNF you could contruct a parser for the data. in stead of sending the data you could send the AST where each node would be identified as one character (or several if you have a lot of different nodes). In your example
we could have
you could the represent a list of files as path[id1,id2,…,idn] using 0,1,2 for the paths and the input being:
you’d then end up with
?things=2[123,456,789]1[1,46,7]where
/stuff/things/itemis represented with2and/files/item/is represented with1each number within[...]is an id. so2[123]would expand to/stuff/things/item123EDIT The approach does not have to be static. If you have to discover the repeated items dynamically you can use the same approach and pass the map between identifier and token. in that case the above example would be
which if the grammar is this simple ofcourse would do better with
compressing the repeated part to less than the unique value with such a short string is possible but will most likely have to be based on constraining the possible values or risk actually increasing the size when “compressing”