I have a French dictionary file which I got from WinEdt.org (Zip File) .

Question

0

Asked: May 26, 20262026-05-26T09:45:30+00:00 2026-05-26T09:45:30+00:00

I have a French dictionary file which I got from WinEdt.org (Zip File) .

0

I have a French dictionary file which I got from WinEdt.org (Zip File). I’d like to read this file into memory, but when I do I get the error:

UnicodeDecodeError: 'ascii' codec can't decode byte 0xe2 in
  position 69: ordinal not in range(128)

I’ve also tried using the codecs module with the encoding utf-8, but that doesn’t work either:

    with codecs.open(self.template_folder_path + "/" + self.test_language + ".txt",
                     'rb', encoding='utf-8') as fp:
        word_list = []

        for line in fp:
            word_list.append(line.strip())

        self.words[self.test_language] = word_list

How can I read this file? I also need to read in a few other dictionary files from that website. How do I go about that?

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-05-26T09:45:30+00:00

latin1 aka ISO-8859-1 is “a snare and a delusion”. Decoding random binary gibberish with latin1 “works”, because the latin1 codec maps all 256 bytes to a Unicode codepoint.

In this case given the information (1) French (2) “WinEdt.org” (hello hello, that’s “Win” as in “Windows”). the file is likely to be encoded in cp1252.

>>> guff = open('fr.dic', 'rb').read()
>>> z = guff.decode('latin1')
>>> sum((128 <= ord(c) < 160) for c in z) # count the C1 control characters
141 
>>> aliens = set(c for c in z if 128 <= ord(c) < 160)
>>> aliens
set([u'\x9c'])
>>> from unicodedata import name
>>> name(u'\x9c')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
ValueError: no such name
>>> name('\x9c'.decode('cp1252'))
'LATIN SMALL LIGATURE OE'

QED

Update: You asked about other files on that website. The first thing to do would be (as the site recommends) to read the .TXT file associated with the dictionary. For example, the large Russian dictionary’s .TXT file says “The dictionary assumes standard Windows Russian codepage (1251)”. Failing that, try the most appropriate from this list:

cp1250 eastern European Latin-based scripts e.g. Polish, Czech, Serbian (Latin script)
cp1251 Cyrillic-based scripts e.g. Russian, Ukrainian, Serbian (Cyrillic script)
cp1252 western European Latin-based scripts e.g. German, French
cp1253 Greek
cp1254 Turkish
cp1255 Hebrew
cp1256 Arabic
cp1257 Estonian, Latvian and Lithuanian
cp1258 Vietnamese

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

I have a French dictionary file which I got from WinEdt.org (Zip File) .

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply