I have read a few links on the topic of file formats and encoding, but how is it done?
If all data is binary, what splits data into different file formats? What exactly does encoding the data involve? How is it done?
Sign Up to our social questions and Answers Engine to ask questions, answer people’s questions, and connect with other people.
Login to our social questions & Answers Engine to ask questions answer people’s questions & connect with other people.
Lost your password? Please enter your email address. You will receive a link and will create a new password via email.
Please briefly explain why you feel this question should be reported.
Please briefly explain why you feel this answer should be reported.
Please briefly explain why you feel this user should be reported.
The main ways to decide what format something is are by file extension or by MIME type – and less frequently by “magic numbers”.
The file extension will be checked by an OS or Application to decide what to do with it (which app to run it in, or which part of code to execute for it).
MIME types are used where an extension (or filename) isn’t always applicable – for example, when downloading a file over HTTP, the URI for a file might be something like
~.php?id=12973. The filetype cannot be determined from ths alone, but the HTTP protocol will send a “Content-Type” definition to say what format the file is, and the browser will handle it correctly. eg: a Content-Type: image/png would force the browser to pass the file to some PNG decoding function.When the application knows what the file format is, it’ll pass the data to code which is written specifically for that format. If the program doesn’t have code to read a format, it will fail to read it.
How a file is encoded is specific to the file. Most standard formats will have a specification to describe their binary encoding, and any application reading that file type must implement code to match the specification. (Although this is usually done by using a library which already does the reading for you).
To give an example of how binary encodings work, consider an image. The specification might say that bytes 10-13 signify the width of the image, and bytes 14-17 signify the height of the image. In order to read those pieces of the information from the file, the code must explicitly read the correct size data at the correct locations indicated by the spec. EG:
fseek(f, 10, SEEK_SET); fread(&width, 4, 1, f); //Read 4 bytes at location 10 into "width"). I think your confusion is “what separates pieces of data in binary files?” (ie, in text files, this can be done by new lines, spaces, comma-separated values (CSV), etc). The answer is: usually the size of the data will determine where it ends – a specification will say what the binary type of each field is (perhaps it may say int32, indicating 32 bits/4 bytes).Other than that, there can be ambiguities in file formats, but usually happens with text files, where the text inside can be read to determine the format. This isn’t always applicable, because often a text file will simply have the extension “.txt”, so it can be unknown to the application what the character encoding of the text is. (This was, and still is a problem for applications which do not use unicode).