Sign Up

Sign Up to our social questions and Answers Engine to ask questions, answer people’s questions, and connect with other people.

Have an account? Sign In

Have an account? Sign In Now

Sign In

Login to our social questions & Answers Engine to ask questions answer people’s questions & connect with other people.

Sign Up Here

Forgot Password?

Don't have account, Sign Up Here

Forgot Password

Lost your password? Please enter your email address. You will receive a link and will create a new password via email.

Have an account? Sign In Now

You must login to ask a question.

Forgot Password?

Need An Account, Sign Up Here

Please briefly explain why you feel this question should be reported.

Please briefly explain why you feel this answer should be reported.

Please briefly explain why you feel this user should be reported.

Sign InSign Up

The Archive Base

The Archive Base Logo The Archive Base Logo

The Archive Base Navigation

  • Home
  • SEARCH
  • About Us
  • Blog
  • Contact Us
Search
Ask A Question

Mobile menu

Close
Ask a Question
  • Home
  • Add group
  • Groups page
  • Feed
  • User Profile
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Buy Points
  • Users
  • Help
  • Buy Theme
  • SEARCH
Home/ Questions/Q 4079654
In Process

The Archive Base Latest Questions

Editorial Team
  • 0
Editorial Team
Asked: May 20, 20262026-05-20T17:49:46+00:00 2026-05-20T17:49:46+00:00

I’m developing a php script involving parsing data from xls files. I’m using library

  • 0

I’m developing a php script involving parsing data from xls files. I’m using library phpexcelreader. All mostly works, but I stumbled upon a strange problem. Some files are parsed incorrecty. Looks like xls files may use different character encodings internally. At least, then I pipe output from my script through iconv -f cp1251 -t utf8, strings get corrected.

Phpexcelreader has an option for specifing output encoding, but looks like it lacks an ability detect input encoding. Any ideas?

  • 1 1 Answer
  • 0 Views
  • 0 Followers
  • 0
Share
  • Facebook
  • Report

Leave an answer
Cancel reply

You must login to add an answer.

Forgot Password?

Need An Account, Sign Up Here

1 Answer

  • Voted
  • Oldest
  • Recent
  • Random
  1. Editorial Team
    Editorial Team
    2026-05-20T17:49:47+00:00Added an answer on May 20, 2026 at 5:49 pm

    The _defaultEncoding property of the workbook object can be set to contain the charset used by the Excel file, and this is then used to handle conversion to UTF-16LE by the reader, but it makes no effort to identify the internal charset itself.

    If you define

    define('SPREADSHEET_EXCEL_READER_TYPE_CODEPAGE',  0x0042);
    

    among the other SPREADSHEET_EXCEL_READER_TYPE definitions, and then modify the switch statement starting at line 464 to include a case for SPREADSHEET_EXCEL_READER_TYPE_CODEPAGE. The logic for this case needs to be something like:

    $length = $this->_GetInt2d($this->_data, $pos + 2);
    $recordData = substr($this->_data, $pos + 4, $length);
    
    // move stream pointer to next record
    $pos += 4 + $length;
    
    // offset: 0; size: 2; code page identifier
    $codepage = $this->_GetInt2d($recordData, 0);
    $codepage = $this->_CodePageNumberToName($codepage)
    

    Recreate the _GetInt2d method (that seems to have been stripped from the code at some point) as

    function _GetInt2d($data, $pos)
    {
        return ord($data[$pos]) | (ord($data[$pos + 1]) << 8);
    }
    

    and create a _CodePageNumberToName method to return the codepage name from its numeric value:

    function _CodePageNumberToName($codePage = '1252')
    {
        switch ($codePage) {
            case 367:   return 'ASCII';     break;  //  ASCII
            case 437:   return 'CP437';     break;  //  OEM US
            case 720:   throw new Exception('Code page 720 not supported.');
                                            break;  //  OEM Arabic
            case 737:   return 'CP737';     break;  //  OEM Greek
            case 775:   return 'CP775';     break;  //  OEM Baltic
            case 850:   return 'CP850';     break;  //  OEM Latin I
            case 852:   return 'CP852';     break;  //  OEM Latin II (Central European)
            case 855:   return 'CP855';     break;  //  OEM Cyrillic
            case 857:   return 'CP857';     break;  //  OEM Turkish
            case 858:   return 'CP858';     break;  //  OEM Multilingual Latin I with Euro
            case 860:   return 'CP860';     break;  //  OEM Portugese
            case 861:   return 'CP861';     break;  //  OEM Icelandic
            case 862:   return 'CP862';     break;  //  OEM Hebrew
            case 863:   return 'CP863';     break;  //  OEM Canadian (French)
            case 864:   return 'CP864';     break;  //  OEM Arabic
            case 865:   return 'CP865';     break;  //  OEM Nordic
            case 866:   return 'CP866';     break;  //  OEM Cyrillic (Russian)
            case 869:   return 'CP869';     break;  //  OEM Greek (Modern)
            case 874:   return 'CP874';     break;  //  ANSI Thai
            case 932:   return 'CP932';     break;  //  ANSI Japanese Shift-JIS
            case 936:   return 'CP936';     break;  //  ANSI Chinese Simplified GBK
            case 949:   return 'CP949';     break;  //  ANSI Korean (Wansung)
            case 950:   return 'CP950';     break;  //  ANSI Chinese Traditional BIG5
            case 1200:  return 'UTF-16LE';  break;  //  UTF-16 (BIFF8)
            case 1250:  return 'CP1250';    break;  //  ANSI Latin II (Central European)
            case 1251:  return 'CP1251';    break;  //  ANSI Cyrillic
            case 0:                                 //  CodePage is not always correctly set when the xls file was saved by Apple's Numbers program
            case 1252:  return 'CP1252';    break;  //  ANSI Latin I (BIFF4-BIFF7)
            case 1253:  return 'CP1253';    break;  //  ANSI Greek
            case 1254:  return 'CP1254';    break;  //  ANSI Turkish
            case 1255:  return 'CP1255';    break;  //  ANSI Hebrew
            case 1256:  return 'CP1256';    break;  //  ANSI Arabic
            case 1257:  return 'CP1257';    break;  //  ANSI Baltic
            case 1258:  return 'CP1258';    break;  //  ANSI Vietnamese
            case 1361:  return 'CP1361';    break;  //  ANSI Korean (Johab)
            case 10000: return 'MAC';       break;  //  Apple Roman
            case 32768: return 'MAC';       break;  //  Apple Roman
            case 32769: throw new Exception('Code page 32769 not supported.');
                                            break;  //  ANSI Latin I (BIFF2-BIFF3)
            case 65001: return 'UTF-8';     break;  //  Unicode (UTF-8)
        }
    }
    

    And store the returned value in $_defaultEncoding

    Alternatively, switch to an Excel reader that can handle the codepage correctly in the first place

    • 0
    • Reply
    • Share
      Share
      • Share on Facebook
      • Share on Twitter
      • Share on LinkedIn
      • Share on WhatsApp
      • Report

Sidebar

Related Questions

link Im having trouble converting the html entites into html characters, (&# 8217;) i
Does anyone know how can I replace this 2 symbol below from the string
this is what i have right now Drawing an RSS feed into the php,
I'm trying to decode HTML entries from here NYTimes.com and I cannot figure out
That's pretty much it. I'm using Nokogiri to scrape a web page what has
I want to count how many characters a certain string has in PHP, but
Seemingly simple, but I cannot find anything relevant on the web. What is the
I have just tried to save a simple *.rtf file with some websites and
I ran into a problem. Wrote the following code snippet: teksti = teksti.Trim() teksti
I have a French site that I want to parse, but am running into

Explore

  • Home
  • Add group
  • Groups page
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Users
  • Help
  • SEARCH

Footer

© 2021 The Archive Base. All Rights Reserved
With Love by The Archive Base

Insert/edit link

Enter the destination URL

Or link to existing content

    No search term specified. Showing recent items. Search or use up and down arrow keys to select an item.