I am currently working on the HTML5 File API, and I need to get binary file data.
The FileReader‘s readAsText, and readAsDataURL methods work fine, but readAsBinaryString returns the same data as readAsText.
I need binary data, but im getting a text string. Am I missing something?
2022 update: See explanation below for why the OP was seeing what they were seeing, but the code there is outdated. In modern environments, you’d use the methods on the
Blobinterface (whichFileinherits):arrayBufferfor reading binary data (which you can then access via any of the typed arrays)textto read textual datastreamfor getting aReadableStreamfor handling data via streaming (which allows you to do multiple transformations on the data without making multiple passes through it and/or use the data without having to keep all of it in memoryOnce you have the file from the file input (
const file = fileInput.files[0]or similar), it’s literally just a matter of:(See
ReadableStreamfor an example of streams.)You might access the array buffer via a
Uint8Array(new Uint8Array(buffer)).Here’s an example of
textandarrayBuffer:Note in 2018:
readAsBinaryStringis outdated. For use cases where previously you’d have used it, these days you’d usereadAsArrayBuffer(or in some cases,readAsDataURL) instead.readAsBinaryStringsays that the data must be represented as a binary string, where:JavaScript originally didn’t have a "binary" type (until ECMAScript 5’s WebGL support of Typed Array* (details below) — it has been superseded by ECMAScript 2015’s ArrayBuffer) and so they went with a String with the guarantee that no character stored in the String would be outside the range 0..255. (They could have gone with an array of Numbers instead, but they didn’t; perhaps large Strings are more memory-efficient than large arrays of Numbers, since Numbers are floating-point.)
If you’re reading a file that’s mostly text in a western script (mostly English, for instance), then that string is going to look a lot like text. If you read a file with Unicode characters in it, you should notice a difference, since JavaScript strings are UTF-16** (details below) and so some characters will have values above 255, whereas a "binary string" according to the File API spec wouldn’t have any values above 255 (you’d have two individual "characters" for the two bytes of the Unicode code point).
If you’re reading a file that’s not text at all (an image, perhaps), you’ll probably still get a very similar result between
readAsTextandreadAsBinaryString, but withreadAsBinaryStringyou know that there won’t be any attempt to interpret multi-byte sequences as characters. You don’t know that if you usereadAsText, becausereadAsTextwill use an encoding determination to try to figure out what the file’s encoding is and then map it to JavaScript’s UTF-16 strings.You can see the effect if you create a file and store it in something other than ASCII or UTF-8. (In Windows you can do this via Notepad; the "Save As" as an encoding drop-down with "Unicode" on it, by which looking at the data they seem to mean UTF-16; I’m sure Mac OS and *nix editors have a similar feature.) Here’s a page that dumps the result of reading a file both ways:
If I use that with a "Testing 1 2 3" file stored in UTF-16, here are the results I get:
As you can see,
readAsTextinterpreted the characters and so I got 13 (the length of "Testing 1 2 3"), andreadAsBinaryStringdidn’t, and so I got 28 (the two-byte BOM plus two bytes for each character).* XMLHttpRequest.response with
responseType = "arraybuffer"is supported in HTML 5.** "JavaScript strings are UTF-16" may seem like an odd statement; aren’t they just Unicode? No, a JavaScript string is a series of UTF-16 code units; you see surrogate pairs as two individual JavaScript "characters" even though, in fact, the surrogate pair as a whole is just one character. See the link for details.