Sign Up

Sign Up to our social questions and Answers Engine to ask questions, answer people’s questions, and connect with other people.

Have an account? Sign In

Have an account? Sign In Now

Sign In

Login to our social questions & Answers Engine to ask questions answer people’s questions & connect with other people.

Sign Up Here

Forgot Password?

Don't have account, Sign Up Here

Forgot Password

Lost your password? Please enter your email address. You will receive a link and will create a new password via email.

Have an account? Sign In Now

You must login to ask a question.

Forgot Password?

Need An Account, Sign Up Here

Please briefly explain why you feel this question should be reported.

Please briefly explain why you feel this answer should be reported.

Please briefly explain why you feel this user should be reported.

Sign InSign Up

The Archive Base

The Archive Base Logo The Archive Base Logo

The Archive Base Navigation

  • SEARCH
  • Home
  • About Us
  • Blog
  • Contact Us
Search
Ask A Question

Mobile menu

Close
Ask a Question
  • Home
  • Add group
  • Groups page
  • Feed
  • User Profile
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Buy Points
  • Users
  • Help
  • Buy Theme
  • SEARCH
Home/ Questions/Q 669469
In Process

The Archive Base Latest Questions

Editorial Team
  • 0
Editorial Team
Asked: May 14, 20262026-05-14T00:09:16+00:00 2026-05-14T00:09:16+00:00

I came across a problem I cannot solve on my own concerning the downloadable

  • 0

I came across a problem I cannot solve on my own concerning the downloadable csv formatted trends data files from Google Insights for Search.

I’m to lazy to reformat the files I4S gives me manually what means: Extracting the section with the actual trends data and reformatting the columns so that I can use it with a modelling program I do for school.

So I wrote a tiny script the should do the work for me: Taking a file, do some magic and give me a new file in proper format.

What it’s supposed to do is reading the file contents, extracting the trends section, splitting it by newlines, splitting each line and then reorder the columns and maybe reformat them.

When looking at a untouched I4S csv file it looks normal containing CR LF caracters at line breaks (maybe thats only because I’m using Windows).

When just reading the contents and then writing them to a new file using the script wierd asian characters appear between CR and LF. I tried the script with a manually written similar looking file and even tried a csv file from Google Trends and it works fine.

I use Python and the script (snippet) I used for the following example
looks like this:

            # Read from an input file 
            file = open(file,"r") 
            contents = file.read() 
            file.close() 
            cfile = open("m.log","w+") 
            cfile.write(contents) 
            cfile.close()

Has anybody an idea why those characters appear??? Thank you for you help!

I’ll give you and example:

First few lines of I4S csv file:

Web Search Interest: foobar
Worldwide; 2004 - present

Interest over time
Week foobar
2004-01-04 - 2004-01-10 44
2004-01-11 - 2004-01-17 44
2004-01-18 - 2004-01-24 37
2004-01-25 - 2004-01-31 40
2004-02-01 - 2004-02-07 49
2004-02-08 - 2004-02-14 51
2004-02-15 - 2004-02-21 45
2004-02-22 - 2004-02-28 61
2004-02-29 - 2004-03-06 51
2004-03-07 - 2004-03-13 48
2004-03-14 - 2004-03-20 50
2004-03-21 - 2004-03-27 56
2004-03-28 - 2004-04-03 59

Output file when reading and writing contents:

Web Search Interest: foobar
਍圀漀爀氀搀眀椀搀攀㬀 ㈀  㐀 ⴀ 瀀爀攀猀攀渀琀ഀഀ

਍䤀渀琀攀爀攀猀琀 漀瘀攀爀 琀椀洀攀ഀഀ
Week foobar
਍㈀  㐀ⴀ ㄀ⴀ 㐀 ⴀ ㈀  㐀ⴀ ㄀ⴀ㄀ ऀ㐀㐀ഀഀ
2004-01-11 - 2004-01-17 44
਍㈀  㐀ⴀ ㄀ⴀ㄀㠀 ⴀ ㈀  㐀ⴀ ㄀ⴀ㈀㐀ऀ㌀㜀ഀഀ
2004-01-25 - 2004-01-31 40
਍㈀  㐀ⴀ ㈀ⴀ ㄀ ⴀ ㈀  㐀ⴀ ㈀ⴀ 㜀ऀ㐀㤀ഀഀ
2004-02-08 - 2004-02-14 51
਍㈀  㐀ⴀ ㈀ⴀ㄀㔀 ⴀ ㈀  㐀ⴀ ㈀ⴀ㈀㄀ऀ㐀㔀ഀഀ
2004-02-22 - 2004-02-28 61
਍㈀  㐀ⴀ ㈀ⴀ㈀㤀 ⴀ ㈀  㐀ⴀ ㌀ⴀ 㘀ऀ㔀㄀ഀഀ
2004-03-07 - 2004-03-13 48
਍㈀  㐀ⴀ ㌀ⴀ㄀㐀 ⴀ ㈀  㐀ⴀ ㌀ⴀ㈀ ऀ㔀 ഀഀ
2004-03-21 - 2004-03-27 56
਍㈀  㐀ⴀ ㌀ⴀ㈀㠀 ⴀ ㈀  㐀ⴀ 㐀ⴀ ㌀ऀ㔀㤀ഀഀ
2004-04-04 - 2004-04-10 69
਍㈀  㐀ⴀ 㐀ⴀ㄀㄀ ⴀ ㈀  㐀ⴀ 㐀ⴀ㄀㜀ऀ㘀㔀ഀഀ
2004-04-18 - 2004-04-24 51
਍㈀  㐀ⴀ 㐀ⴀ㈀㔀 ⴀ ㈀  㐀ⴀ 㔀ⴀ ㄀ऀ㔀㄀ഀഀ
2004-05-02 - 2004-05-08 56
਍㈀  㐀ⴀ 㔀ⴀ 㤀 ⴀ ㈀  㐀ⴀ 㔀ⴀ㄀㔀ऀ㔀㈀ഀഀ
2004-05-16 - 2004-05-22 54
਍㈀  㐀ⴀ 㔀ⴀ㈀㌀ ⴀ ㈀  㐀ⴀ 㔀ⴀ㈀㤀ऀ㔀㔀ഀഀ
2004-05-30 - 2004-06-05 74
਍㈀  㐀ⴀ 㘀ⴀ 㘀 ⴀ ㈀  㐀ⴀ 㘀ⴀ㄀㈀ऀ㔀㜀ഀഀ
2004-06-13 - 2004-06-19 50
਍㈀  㐀ⴀ 㘀ⴀ㈀  ⴀ ㈀  㐀ⴀ 㘀ⴀ㈀㘀ऀ㔀㐀ഀഀ
2004-06-27 - 2004-07-03 58
਍㈀  㐀ⴀ 㜀ⴀ 㐀 ⴀ ㈀  㐀ⴀ 㜀ⴀ㄀ ऀ㔀㤀ഀഀ
2004-07-11 - 2004-07-17 59
਍㈀  㐀ⴀ 㜀ⴀ㄀㠀 ⴀ ㈀  㐀ⴀ 㜀ⴀ㈀㐀ऀ㘀㈀ഀഀ

  • 1 1 Answer
  • 0 Views
  • 0 Followers
  • 0
Share
  • Facebook
  • Report

Leave an answer
Cancel reply

You must login to add an answer.

Forgot Password?

Need An Account, Sign Up Here

1 Answer

  • Voted
  • Oldest
  • Recent
  • Random
  1. Editorial Team
    Editorial Team
    2026-05-14T00:09:16+00:00Added an answer on May 14, 2026 at 12:09 am

    repr() is your friend (except on Python 3.X; use ascii() instead).

    prompt>\python26\python -c "print repr(open('report.csv','rb').read()[:300])"
    '\xff\xfeW\x00e\x00b\x00 \x00S\x00e\x00a\x00r\x00c\x00h\x00 \x00I\x00n\x00t\x00e
    \x00r\x00e\x00s\x00t\x00:\x00 \x00f\x00o\x00o\x00b\x00a\x00r\x00\r\x00\n\x00W\x0
    [snip]
    x001\x007\x00\t\x004\x004\x00\r\x00\n\x002\x000\x00'
    

    Sure looks like a UTF-16LE BOM (U+FEFF) in the 1st two bytes to me.

    Notepad.* are NOT your friends. UTF-16 should not be referred to as “UCS-2” or “Unicode”.

    The following should help with what to do next:

    >>> import codecs
    >>> lines = list(codecs.open('report.csv', 'r', encoding='UTF-16'))
    >>> import pprint
    >>> pprint.pprint(lines[:8])
    [u'Web Search Interest: foobar\r\n',
     u'Worldwide; 2004 - present\r\n',
     u'\r\n',
     u'Interest over time\r\n',
     u'Week\tfoobar\r\n',
     u'2004-01-04 - 2004-01-10\t44\r\n',
     u'2004-01-11 - 2004-01-17\t44\r\n',
     u'2004-01-18 - 2004-01-24\t37\r\n']
    >>>
    

    Update: Why your output file looks like gobbledegook.

    Firstly, you are looking at the files with something (Notepad.* maybe) that knows that the files are allegedly encoded in UTF-16LE, and displays them accordingly. So your input file looks fine.

    However, your script is reading the input file as raw bytes. It then writes the output file as raw bytes in text mode (‘w’) (as opposed to binary mode (‘wb’)). Because you are on Windows, every \n will be replaced by \r\n. This is adding one byte (HALF of a UTF-16 character) to every line. So every SECOND line will be bassackwards aka UTF-16BE … the letter A which is \x41\x00 in UTF-16LE will lose its trailing \x00 and pick up a leading byte (probably \x00) from the character to the left. \x00\x41 is the UTF-16LE for a CJK (“Asian”) character.

    Suggested reading: the Python Unicode HOWTO and this piece by Joel.

    • 0
    • Reply
    • Share
      Share
      • Share on Facebook
      • Share on Twitter
      • Share on LinkedIn
      • Share on WhatsApp
      • Report

Sidebar

Related Questions

recently i came across strange problem while uploading files to my new server. I
I have recenty came across a problem while importing older data than my current
I came across a problem in my current application that required fiddling with the
I came across this problem while preparing for an interview and curious to know
Today I came across a problem where someone had accidentally committed a proj.user file
I am new to creating Java web applications and came across this problem when
After getting fine answer to my previous question , I came across another problem.
So I came across an interesting problem today. We have a WCF web service
While investigating a reported problem with my site loading slowly, I came across an
I'm still trying to debug a very sneaky memory corruption problem. I came across

Explore

  • Home
  • Add group
  • Groups page
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Users
  • Help
  • SEARCH

Footer

© 2021 The Archive Base. All Rights Reserved
With Love by The Archive Base

Insert/edit link

Enter the destination URL

Or link to existing content

    No search term specified. Showing recent items. Search or use up and down arrow keys to select an item.