This seems like it should be an easy fix, but so far a solution

Question

0

Asked: June 4, 20262026-06-04T08:56:58+00:00 2026-06-04T08:56:58+00:00

This seems like it should be an easy fix, but so far a solution

0

This seems like it should be an easy fix, but so far a solution has eluded me. I have a single column csv file with non-ascii chars saved in utf-8 that I want to read in and store in a list. I’m attempting to follow the principle of the “Unicode Sandwich” and decode upon reading the file in:

import codecs
import csv

with codecs.open('utf8file.csv', 'rU', encoding='utf-8') as file:
input_file = csv.reader(file, delimiter=",", quotechar='|')
list = []
for row in input_file:
    list.extend(row)

This produces the dread ‘codec can’t encode characters in position, ordinal not in range(128)’ error.

I’ve also tried adapting a solution from this answer, which returns a similar error

def unicode_csv_reader(utf8_data, dialect=csv.excel, **kwargs):
    csv_reader = csv.reader(utf8_data, dialect=dialect, **kwargs)
    for row in csv_reader:
        yield [unicode(cell, 'utf-8') for cell in row]

filename = 'inputs\encode.csv'
reader = unicode_csv_reader(open(filename))
target_list = []
for field1 in reader:
    target_list.extend(field1)

A very similar solution adapted from the docs returns the same error.

def unicode_csv_reader(utf8_data, dialect=csv.excel):
    csv_reader = csv.reader(utf_8_encoder(utf8_data), dialect)
    for row in csv_reader:
        yield [unicode(cell, 'utf-8') for cell in row]

def utf_8_encoder(unicode_csv_data):
    for line in unicode_csv_data:
    yield line.encode('utf-8')

filename = 'inputs\encode.csv'
reader = unicode_csv_reader(open(filename))
target_list = []
for field1 in reader:
    target_list.extend(field1)

Clearly I’m missing something. Most of the questions that I’ve seen regarding this problem seem to predate Python 2.7, so an update here might be useful.

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-06-04T08:56:59+00:00

Editorial Team

2026-06-04T08:56:59+00:00Added an answer on June 4, 2026 at 8:56 am

Your first snippet won’t work. You are feeding unicode data to the csv reader, which (as documented) can’t handle it.

Your 2nd and 3rd snippets are confused. Something like the following is all that you need:

f = open('your_utf8_encoded_file.csv', 'rb')
reader = csv.reader(f)
for utf8_row in reader:
    unicode_row = [x.decode('utf8') for x in utf8_row]
    print unicode_row

0

Reply
Share
Share

- Report

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

This seems like it should be an easy fix, but so far a solution

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply