Is it possible to find and replace any repetitive characters in a string using C#? I’m trying to reduce the size of a base64 string, which is converted from a jpeg image. I’ve noticed that the base64 strings contain many repeated characters such as:
6qdQAUUxJA7uuCGQ8g/wA6fQAUUUUAFFFFABRRRQAUUUUAFFFFABRRRQAUUUUAFFFFABRRRQAUUUUAFFFFABRRRQAUUUUAFFFFABRRRQAUUUUAFFFFABRRRQAUUUUAFFFFABRRRQAUUUUAFFFFABRRRQAUUUUAFFFFABRRRQAUUUUAFFFFABRRRQAUUUUAFFFFABRRRQAUUUUAFFFFABRRRQAUUUUAFFFFABRRRQAUUUUAFFFFABRRRQAUUUUAFFFFABRRRQAUUUUAFFFFABRRRQAUUUUAFFFFABRRRQAUUUUAFFFFABRRRQAUUUUAFYXiFL5b7TrmwtzM8Xmr7KWUAE+
If there was a way to remove the repetitive characters with something like this it would overall be much smaller:
[QAUUUUAFFFFABRRR, 18]
This is in the format of [REPEATED-CHARACTERS, NUMBER-OF-TIMES].
Would this be possible to do? Thanks for the help. 🙂
You’re essentially trying to come up with your own lossless compression algorithm – algorithms like zip work by doing exactly what you’re asking for, except that they work on bytes rather than characters in a string.
Popular compression algorithms are virtually guaranteed to be more efficient than something you can design and implement in a reasonable amount of time. For one, they will probably see patterns that aren’t evident in the base64 string due to byte alignment issues.
So why not just use one of them to compress the binary data before base64-encoding it, instead of the other way around?