I need a file that I want to make sure is encoded with utf8.
So, I create the file
c:\> gvim umlaute.txt
In VIM I type the Umlaute:
äöü
I check the encoding …
:set enc
(VIM echoes encoding=latin1)
and then I check the file encoding …
:set fenc
(VIM echoes fileencoding=)
Then I write the file
:w
And check the file’s size on the harddisk:
!dir umlaute.txt
(The size is 5 bytes) That is of course expected, 3 bytes for the text and 2 for the \x0a \x0d.
Ok, so I now set the encoding to
:set enc=utf8
The buffer get’s wierd
<e4><f6><fc>
I guess this is the hex representation of the ascii characters I previously typed in. So I rewrite them
äöü
Writing, checking size:
:w
:$ dir umlaute.txt
This time, it’s 8 bytes. I guess that makes sense 2 bytes for every character plus \x0d \x0a.
Ok, so I want to make sure the next time I open the file it will be opened with encodiung=utf8.
:setb
:w
:$ dir umlaute.txt
11 Bytes. This is of course 8 (previous) Bytes + 3 Bytes for the BOM (ef bb bf).
So I
:quit
vim and open the file again
and check, if the encoding is set:
:set enc
But VIM insists its encoding=latin1.
So, why is that. I would have expected the BOM to tell VIM that this is a UTF8 file.
You are confusing
'encoding'which is a Vim global setting, and'fileencoding', which is a local setting to each buffer.When opening a file, the variable
'fileencodings'(note the final s) determines what encodings Vim will try to open the file with. If it starts withucs-bomthen any file with a BOM will be properly opened if it parses correctly.If you want to change the encoding of a file, you should use
:set fenc=<foo>. If you want to remove the BOM you should use:set [no]bomb. Then use:wto save.Avoid changing
encafter having opened a buffer, it could mess up things.encdetermines what characters vim can work with, and it has nothing to do with the files that you are working with.Details
You are opening vim, with a nonexistent file name. Vim creates a buffer, gives it that name, and sets
fencto an empty value since there is no file associated with it.This means that the Vim stores the buffer contents in ISO-8859-1 (maybe another number).
This is normal, there is no file for the moment.
Since
'fileencoding'is empty, it will write it to the disk using the internal encoding,latin1.WRONG! You are telling vim that it must interpret the buffer contents as UTF8 content. the buffer contains, in hexadecimal,
e4 f6 fc 0a 0d, the first three bytes are invalid UTF8 character sequences. You should have typed:set fenc=utf-8. This would have converted the buffer.That’s what happens when you force Vim to interpret an illegal UTF-8 file as UTF8.
You should run
set fenc?to know what is the detected encoding of your file. And if you want Vim to be able to work with Unicode files, you should set in your vimrc that'enc'is utf-8.