Sign Up

Sign Up to our social questions and Answers Engine to ask questions, answer people’s questions, and connect with other people.

Have an account? Sign In

Have an account? Sign In Now

Sign In

Login to our social questions & Answers Engine to ask questions answer people’s questions & connect with other people.

Sign Up Here

Forgot Password?

Don't have account, Sign Up Here

Forgot Password

Lost your password? Please enter your email address. You will receive a link and will create a new password via email.

Have an account? Sign In Now

You must login to ask a question.

Forgot Password?

Need An Account, Sign Up Here

Please briefly explain why you feel this question should be reported.

Please briefly explain why you feel this answer should be reported.

Please briefly explain why you feel this user should be reported.

Sign InSign Up

The Archive Base

The Archive Base Logo The Archive Base Logo

The Archive Base Navigation

  • SEARCH
  • Home
  • About Us
  • Blog
  • Contact Us
Search
Ask A Question

Mobile menu

Close
Ask a Question
  • Home
  • Add group
  • Groups page
  • Feed
  • User Profile
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Buy Points
  • Users
  • Help
  • Buy Theme
  • SEARCH
Home/ Questions/Q 8277691
In Process

The Archive Base Latest Questions

Editorial Team
  • 0
Editorial Team
Asked: June 8, 20262026-06-08T08:45:30+00:00 2026-06-08T08:45:30+00:00

I am trying to import a CSV file into my PHP application built with

  • 0

I am trying to import a CSV file into my PHP application built with Drupal. I have encountered a strange situation when importing CSV files exported from Mozilla Thunderbird (I am exporting the address book of contacts). If the I export using the Windows version of Thunderbird, any multibyte characters are not rendered to the screen, and appear as missing characters when dumping the contents of the extracted contents to screen. However, this problem does not exist when using an identical file created using the Linux Version of Thunderbird. In this case eveything works perfectly.

To test this I have installed the same version of Thunderbird on Linux and Windows 7. I then create the same single user (surname: 张, given name: 利) in the address book, then export the address book as a CSV file. As mentions above the linux CSV file works imports successfully but the Windows one doesn’t.

If I examine both files in linux using file --mime myfilename.csv is get the following output:

LinuxTB14.csv: text/plain; charset=utf-8

WinTB14.csv: text/plain; charset=iso-8859-1

So the windows file, even though it contains Chinese characters, is being encoded as iso-8859-1. After discovering this, I assumed that it is an encoding issue and that I just need to tell PHP to encode the offending content as UTF-8.

Problem is that PHP appears to be detecting the encoding in another way that I can’t understand.

// Set correct locale to avoid any issues with multibyte characters.
$original_local_value = setlocale(LC_CTYPE, 0);
if ($original_local_value !== 'en_US.UTF-8') {
  setlocale(LC_CTYPE, 'en_US.UTF-8');
} 
$handle = fopen($file->uri, "r");
$cardinfo = array();
while (($data = fgetcsv($handle, 5000, ",")) !== FALSE) {
  $cardinfo[] = $data;
  // dsm() is a drupal function which prints the content of the argument to screen.
  dsm(mb_detect_encoding($data[0])); 
  dsm($data[0]);
}

If I include the above code, which shows the encoding and content of the first value in each line of the CSV file, I get the following rendered to the screen:

For the CSV created by Thunderbird in windows

ASCII

First Name

UTF-8

For the CSV create by Thunderbird in Linux

ASCII

First Name

UTF-8

利

As you can see PHP is reporting the same encoding for both files, even though the Chinese character in the Windows file is not being printed to screen.

Anyone have any ideas what might be going on here?

EDIT

If I open the Windows CSV file in notepad and save as.. UTF-8 format, then the file will import correctly. So it is obviously an encoding issue. I have added the following code to convert the file encoding if it is not already set to UTF-8.

  $file_contents = file_get_contents($file->uri);
  $file_encoding = mb_detect_encoding($file_contents, 'UTF-8, ISO-8859-1, WINDOWS-1252');
  if ($file_encoding  !== 'UTF-8') {
    $file_contents = iconv($file_encoding, 'UTF-8', $file_contents);
    $handle = fopen($file->uri, 'w');
    fwrite($handle, $file_contents);
    fclose($handle);
  }

This partially fixes the problem. The characters are appearing, but they are garbled (e.g. 张 appears as ÕÅ). I checked the page encoding of my browser and the page headers and both are set to UTF-8, so it is not a browser issue.

Any ideas?

  • 1 1 Answer
  • 0 Views
  • 0 Followers
  • 0
Share
  • Facebook
  • Report

Leave an answer
Cancel reply

You must login to add an answer.

Forgot Password?

Need An Account, Sign Up Here

1 Answer

  • Voted
  • Oldest
  • Recent
  • Random
  1. Editorial Team
    Editorial Team
    2026-06-08T08:45:32+00:00Added an answer on June 8, 2026 at 8:45 am

    The only solution I have come up with for this issue to not try to detect and convert the encoding of the uploaded file in the first place. After much research it appears that reliable encoding detection is not really existent. There is just too much room for error in doing this.

    The safest option is to ensure that the uploaded file is encoded in UTF-8, as UTF-8 encoding can be reliably detected. The following code is how I am doing the UTF-8 encoding detection.

    $file_content = file_get_contents($file->uri);
    // Create regex pattern which detects UTF-8 encoding.
    $regex = '%^(?:
      [\x09\x0A\x0D\x20-\x7E]              # ASCII
      | [\xC2-\xDF][\x80-\xBF]             # non-overlong 2-byte
      | \xE0[\xA0-\xBF][\x80-\xBF]         # excluding overlongs
      | [\xE1-\xEC\xEE\xEF][\x80-\xBF]{2}  # straight 3-byte
      | \xED[\x80-\x9F][\x80-\xBF]         # excluding surrogates
      | \xF0[\x90-\xBF][\x80-\xBF]{2}      # planes 1-3
      | [\xF1-\xF3][\x80-\xBF]{3}          # planes 4-15
      | \xF4[\x80-\x8F][\x80-\xBF]{2}      # plane 16
    )*$%xs';
    if (!preg_match($regex, $file_content)) {
      // Not valid UTF-8 encoding so flag an error.
    }
    
    • 0
    • Reply
    • Share
      Share
      • Share on Facebook
      • Share on Twitter
      • Share on LinkedIn
      • Share on WhatsApp
      • Report

Sidebar

Related Questions

I am trying to import a txt/csv file into my postgres database from php
I am trying to import a .csv file into a table. I have figured
We are trying to import a CSV file via PHP into MySQL. The character
I'm currently trying to import about 10000 rows (from a CSV file) into an
I am trying to import a csv file to insert data into an existing
I'm trying to import a csv file into SQL using SSIS and am hitting
I am trying to import a .csv file into my database. I am able
I am trying to import a .csv file into my database. I am able
I am trying to build a import form for a CSV file into the
I'm trying to import a CSV files into my MySQL table using the following

Explore

  • Home
  • Add group
  • Groups page
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Users
  • Help
  • SEARCH

Footer

© 2021 The Archive Base. All Rights Reserved
With Love by The Archive Base

Insert/edit link

Enter the destination URL

Or link to existing content

    No search term specified. Showing recent items. Search or use up and down arrow keys to select an item.