Sign Up

Sign Up to our social questions and Answers Engine to ask questions, answer people’s questions, and connect with other people.

Have an account? Sign In

Have an account? Sign In Now

Sign In

Login to our social questions & Answers Engine to ask questions answer people’s questions & connect with other people.

Sign Up Here

Forgot Password?

Don't have account, Sign Up Here

Forgot Password

Lost your password? Please enter your email address. You will receive a link and will create a new password via email.

Have an account? Sign In Now

You must login to ask a question.

Forgot Password?

Need An Account, Sign Up Here

Please briefly explain why you feel this question should be reported.

Please briefly explain why you feel this answer should be reported.

Please briefly explain why you feel this user should be reported.

Sign InSign Up

The Archive Base

The Archive Base Logo The Archive Base Logo

The Archive Base Navigation

  • SEARCH
  • Home
  • About Us
  • Blog
  • Contact Us
Search
Ask A Question

Mobile menu

Close
Ask a Question
  • Home
  • Add group
  • Groups page
  • Feed
  • User Profile
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Buy Points
  • Users
  • Help
  • Buy Theme
  • SEARCH
Home/ Questions/Q 7440515
In Process

The Archive Base Latest Questions

Editorial Team
  • 0
Editorial Team
Asked: May 29, 20262026-05-29T10:51:52+00:00 2026-05-29T10:51:52+00:00

I have a UTF-16 CSV file which I have to read. Python csv module

  • 0

I have a UTF-16 CSV file which I have to read. Python csv module does not seem to support UTF-16.

I am using python 2.7.2. CSV files I need to parse are huge size running into several GBs of data.

Answers for John Machin questions below

print repr(open('test.csv', 'rb').read(100))

Output with test.csv having just abc as content

'\xff\xfea\x00b\x00c\x00'

I think csv file got created on windows machine in USA. I am using Mac OSX Lion.

If I use code provided by phihag and test.csv containing one record.

sample test.csv content used. Below is print repr(open(‘test.csv’, ‘rb’).read(1000)) output

'\xff\xfe1\x00,\x002\x00,\x00G\x00,\x00S\x00,\x00H\x00 \x00f\x00\xfc\x00r\x00 \x00e\x00 \x00\x96\x00 \x00m\x00 \x00\x85\x00,\x00,\x00I\x00\r\x00\n\x00'

Code by phihag

import codecs
import csv
with open('test.csv','rb') as f:
      sr = codecs.StreamRecoder(f,codecs.getencoder('utf-8'),codecs.getdecoder('utf-8'),codecs.getreader('utf-16'),codecs.getwriter('utf-16'))      
      for row in csv.reader(sr):
         print row

Output of the above code

['1', '2', 'G', 'S', 'H f\xc3\xbcr e \xc2\x96 m \xc2\x85']
['', '', 'I']

expected output is

['1', '2', 'G', 'S', 'H f\xc3\xbcr e \xc2\x96 m \xc2\x85','','I']
  • 1 1 Answer
  • 0 Views
  • 0 Followers
  • 0
Share
  • Facebook
  • Report

Leave an answer
Cancel reply

You must login to add an answer.

Forgot Password?

Need An Account, Sign Up Here

1 Answer

  • Voted
  • Oldest
  • Recent
  • Random
  1. Editorial Team
    Editorial Team
    2026-05-29T10:51:53+00:00Added an answer on May 29, 2026 at 10:51 am

    At the moment, the csv module does not support UTF-16.

    In Python 3.x, csv expects a text-mode file and you can simply use the encoding parameter of open to force another encoding:

    # Python 3.x only
    import csv
    with open('utf16.csv', 'r', encoding='utf16') as csvf:
        for line in csv.reader(csvf):
            print(line) # do something with the line
    

    In Python 2.x, you can recode the input:

    # Python 2.x only
    import codecs
    import csv
    
    class Recoder(object):
        def __init__(self, stream, decoder, encoder, eol='\r\n'):
            self._stream = stream
            self._decoder = decoder if isinstance(decoder, codecs.IncrementalDecoder) else codecs.getincrementaldecoder(decoder)()
            self._encoder = encoder if isinstance(encoder, codecs.IncrementalEncoder) else codecs.getincrementalencoder(encoder)()
            self._buf = ''
            self._eol = eol
            self._reachedEof = False
    
        def read(self, size=None):
            r = self._stream.read(size)
            raw = self._decoder.decode(r, size is None)
            return self._encoder.encode(raw)
    
        def __iter__(self):
            return self
    
        def __next__(self):
            if self._reachedEof:
                raise StopIteration()
            while True:
                line,eol,rest = self._buf.partition(self._eol)
                if eol == self._eol:
                    self._buf = rest
                    return self._encoder.encode(line + eol)
                raw = self._stream.read(1024)
                if raw == '':
                    self._decoder.decode(b'', True)
                    self._reachedEof = True
                    return self._encoder.encode(self._buf)
                self._buf += self._decoder.decode(raw)
        next = __next__
    
        def close(self):
            return self._stream.close()
    
    with open('test.csv','rb') as f:
        sr = Recoder(f, 'utf-16', 'utf-8')
    
        for row in csv.reader(sr):
            print (row)
    

    open and codecs.open require the file to start with a BOM. If it doesn’t (or you’re on Python 2.x), you can still convert it in memory, like this:

    try:
        from io import BytesIO
    except ImportError: # Python < 2.6
        from StringIO import StringIO as BytesIO
    import csv
    with open('utf16.csv', 'rb') as binf:
        c = binf.read().decode('utf-16').encode('utf-8')
    for line in csv.reader(BytesIO(c)):
        print(line) # do something with the line
    
    • 0
    • Reply
    • Share
      Share
      • Share on Facebook
      • Share on Twitter
      • Share on LinkedIn
      • Share on WhatsApp
      • Report

Sidebar

Related Questions

I can't create an utf-8 csv file in Python. I'm trying to read it's
I'm trying to use the csv module to read a utf-8 csv file, and
I'm reading in a file with Python's csv module, and have Yet Another Encoding
I have a Unicode file (UTF-16 FFFE little-endian BOM) which contains rows of tab-separated
I have a csv file like 120 column by 4500 row. I read the
We have a web app that exports CSV files containing foreign characters with UTF-8,
I have a little problem, I am building a database from CSV files using
Problem with german umlauts in generated CSV file using Grails Hi, I have a
I have a csv file that is utf-16 containing Tamil characters and displays fine
This is what I am trying to achieve: I have a CSV file which

Explore

  • Home
  • Add group
  • Groups page
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Users
  • Help
  • SEARCH

Footer

© 2021 The Archive Base. All Rights Reserved
With Love by The Archive Base

Insert/edit link

Enter the destination URL

Or link to existing content

    No search term specified. Showing recent items. Search or use up and down arrow keys to select an item.