I remember hearing that sometimes computers will save images in interesting ways. For example an image where some pixels next to each other are the same exact color, then they are stored as information that the next 30 or some pixels are all “red,” rather than saying each one individually has the values that we know as the color red.
Sometimes you download large programs that are gigabytes in size, but start as a 900 kb file. I guess these are simply tools that connect you to a ftp which gives you some data among other things usually.
So why cant we store gigabytes of information as kilobytes if we do not immediately need to access the information, say for long-term storage?
Take this example, the program is asked to compress a file that is 1024 kilobytes in size. The program used to compress this information detects that the bytes in memory simply form a pattern of {1,0,0,1,0,0,…}. The program creates an algorithm which it can recognize later if asked to uncompress the information and stores the length of memory that this pattern should occupy. The data that the original information occupied is now much smaller.
If the algorithm required is too complicated to compress the entire file, perhaps the computer can separate the data into different algorithms used to output smaller lengths of data when decompression is asked of it.
Is this a realistic approach to compressing data? I thought this might already be used because sometimes I see that a program, windows 7 for example, “expands” data. Is this what the program is really doing?
LZW works by building a dictionary of bit strings and then using references to that dictionary instead of the string. Other compression algorithms work in different ways, but the idea is always to find a smaller representation. Some compression, such as jpeg loses data, this is fine for media where our eyes or ears are able to fill in the missing data. Others such as LZW are “lossless”, they don’t lose data.
Some compression programs such as pkzip and winzip use various algorithms based on the data. This approach can’t be used for streaming, but works well for files.
The whole area is very complex, you could spend a lifetime on it an still not know everything about it. Good luck in your pursuit.