When I use iconv to convert from UTF16 to UTF8 then all is fine

Question

0

Asked: May 28, 20262026-05-28T06:46:00+00:00 2026-05-28T06:46:00+00:00

When I use iconv to convert from UTF16 to UTF8 then all is fine

0

When I use iconv to convert from UTF16 to UTF8 then all is fine but vice versa it does not work.
I have these files:

a-16.strings:    Little-endian UTF-16 Unicode c program text
a-8.strings:     UTF-8 Unicode c program text, with very long lines

The text look OK in editor. When I run this:

iconv -f UTF-8 -t UTF-16LE a-8.strings > b-16.strings

Then I get this result:

b-16.strings:    data
a-16.strings:    Little-endian UTF-16 Unicode c program text
a-8.strings:     UTF-8 Unicode c program text, with very long lines

The file utility does not show expected file format and the text does not look good in editor either. Could it be that iconv does not create proper BOM? I run it on MAC command line.

Why is not the b-16 in proper UTF-16LE format? Is there another way of converting utf8 to utf16?

More elaboration is bellow.

$ iconv -f UTF-8 -t UTF-16LE a-8.strings > b-16le-BAD-fromUTF8.strings
$ iconv -f UTF-8 -t UTF-16 a-8.strings > b-16be.strings 
$ iconv -f UTF-16 -t UTF-16LE b-16be.strings > b-16le-BAD-fromUTF16BE.strings

$ file *s
a-16.strings:                   Little-endian UTF-16 Unicode c program text, with very long lines
a-8.strings:                    UTF-8 Unicode c program text, with very long lines
b-16be.strings:                 Big-endian UTF-16 Unicode c program text, with very long lines
b-16le-BAD-fromUTF16BE.strings: data
b-16le-BAD-fromUTF8.strings:    data


$ od -c a-16.strings | head
0000000  377 376   /  \0   *  \0      \0  \f 001   E  \0   S  \0   K  \0

$ od -c a-8.strings | head 
0000000    /   *   *   *       Č  **   E   S   K   Y       (   J   V   O

$ od -c b-16be.strings | head
0000000  376 377  \0   /  \0   *  \0   *  \0   *  \0     001  \f  \0   E

$ od -c b-16le-BAD-fromUTF16BE.strings | head                                
0000000    /  \0   *  \0   *  \0   *  \0      \0  \f 001   E  \0   S  \0

$ od -c b-16le-BAD-fromUTF8.strings | head
0000000    /  \0   *  \0   *  \0   *  \0      \0  \f 001   E  \0   S  \0

It is clear the BOM is missing whenever I run conversion to UTF-16LE.
Any help on this?

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-05-28T06:46:00+00:00

UTF-16LE tells iconv to generate little-endian UTF-16 without a BOM (Byte Order Mark). Apparently it assumes that since you specified LE, the BOM isn’t necessary.

UTF-16 tells it to generate UTF-16 text (in the local machine’s byte order) with a BOM.

If you’re on a little-endian machine, I don’t see a way to tell iconv to generate big-endian UTF-16 with a BOM, but I might just be missing something.

I find that the file command doesn’t recognize UTF-16 text without a BOM, and your editor might not either. But if you run iconv -f UTF-16LE -t UTF_8 b-16 strings, you should get a valid UTF-8 version of the original file.

Try running od -c on the files to see their actual contents.

UPDATE :

It looks like you’re on a big-endian machine (x86 is little-endian), and you’re trying to generate a little-endian UTF-16 file with a BOM. Is that correct? As far as I can tell, iconv won’t do that directly. But this should work:

( printf "\xff\xfe" ; iconv -f utf-8 -t utf-16le UTF-8-FILE ) > UTF-16-FILE

The behavior of the printf might depend on your locale settings; I have LANG=en_US.UTF-8.

(Can anyone suggest a more elegant solution?)

Another workaround, if you know the endianness of the output produced by -t utf-16:

iconv -f utf-8 -t utf-16 UTF-8-FILE | dd conv=swab 2>/dev/null

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

When I use iconv to convert from UTF16 to UTF8 then all is fine

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply