I have a file which is uploaded via a regular form_for, this gives me a ActionDispatch::Http::UploadedFile object in the params hash on which I can call .read to get the content. I now need to embed the file in an XML document. I’m using a regular Ruby string for now to construct the XML. The default encoding for a Rails string is utf-8.
Therefore I get the error Encoding::UndefinedConversionError, "\x89" from ASCII-8BIT to UTF-8.
This happens for the following files:
what-matters-now-1.pdf: application/octet-stream; charset=binary example.csv: text/plain; charset=utf-8 investigations.png: image/png; charset=binary
It does not happen for:
my_test.txt: text/plain; charset=us-ascii
I have tried changing the encoding, but I get the same error:
params[:file].read.encode('utf-8')
First, you cannot embed a binary file in an XML document without some sort of conversion to text. At least the PDF document and the PNG image need to be encoded somehow – probably
Base64– before you start trying to treat their contents as strings of characters instead of sequences of bytes.The
UndefinedConversionErrorindicates that you’re trying to convert text into UTF-8 from what Ruby thinks is ASCII. But the source text includes a byte whose value is 0x89 (137 decimal), which is outside the ASCII range. That is not at all unexpected if the source file is a binary file, and base64-encoding it will fix that problem.If, however, the source file generating that error is already text, then you need to determine and specify what character set it is actually using. The 0x89 indicates it is neither ASCII nor UTF-8, so the most likely options are Latin-1 or Windows-1252.