I know that I can do the following:
>>> import encodings, pprint
>>> pprint.pprint(sorted(encodings.aliases.aliases.values()))
['ascii',
'base64_codec',
'big5',
'big5hkscs',
'bz2_codec',
'cp037',
'cp1026',
'cp1140',
'cp1250',
'cp1251',
'cp1252',
'cp1253',
'cp1254',
'cp1255',
'cp1256',
'cp1257',
'cp1258',
'cp424',
'cp437',
'cp500',
'cp775',
'cp850',
'cp852',
'cp855',
'cp857',
'cp860',
'cp861',
'cp862',
'cp863',
'cp864',
'cp865',
'cp866',
'cp869',
'cp932',
'cp949',
'cp950',
'euc_jis_2004',
'euc_jisx0213',
'euc_jp',
'euc_kr',
'gb18030',
'gb2312',
'gbk',
'hex_codec',
'hp_roman8',
'hz',
'iso2022_jp',
'iso2022_jp_1',
'iso2022_jp_2',
'iso2022_jp_2004',
'iso2022_jp_3',
'iso2022_jp_ext',
'iso2022_kr',
'iso8859_10',
'iso8859_11',
'iso8859_13',
'iso8859_14',
'iso8859_15',
'iso8859_16',
'iso8859_2',
'iso8859_3',
'iso8859_4',
'iso8859_5',
'iso8859_6',
'iso8859_7',
'iso8859_8',
'iso8859_9',
'johab',
'koi8_r',
'latin_1',
'mac_cyrillic',
'mac_greek',
'mac_iceland',
'mac_latin2',
'mac_roman',
'mac_turkish',
'mbcs',
'ptcp154',
'quopri_codec',
'rot_13',
'shift_jis',
'shift_jis_2004',
'shift_jisx0213',
'tactis',
'tis_620',
'utf_16',
'utf_16_be',
'utf_16_le',
'utf_32',
'utf_32_be',
'utf_32_le',
'utf_7',
'utf_8',
'uu_codec',
'zlib_codec']
I also know for sure that this is not a complete list, since it includes only encodings for which an alias exists (e.g “cp737” is missing), and at least some pseudo-encodings are missing (e.g “string_escape”).
As the title of the question says: how can I programmatically get a list of all codecs/encodings known to Python?
If not programmatically: is there a complete list available online?
I don’t think the complete list is stored anywhere in the python standard library. Instead, encodings are loaded on demand through calls to
encoding.search_function(encoding). If you study the code there, it looks likeencodingstring is first normalized and then theencodingspackage is searched for submodules whose name matchesencoding.The following uses
pkgutilto list all the submodules ofencoding, and then adds them to those listed inencoding.aliases.aliases.Unfortunately,
encoding.aliases.aliasescontains one encoding,tactisthat is not generated by the above, so I tried to generate the complete list by union-ing the two sets.