Sign Up

Sign Up to our social questions and Answers Engine to ask questions, answer people’s questions, and connect with other people.

Have an account? Sign In

Have an account? Sign In Now

Sign In

Login to our social questions & Answers Engine to ask questions answer people’s questions & connect with other people.

Sign Up Here

Forgot Password?

Don't have account, Sign Up Here

Forgot Password

Lost your password? Please enter your email address. You will receive a link and will create a new password via email.

Have an account? Sign In Now

You must login to ask a question.

Forgot Password?

Need An Account, Sign Up Here

Please briefly explain why you feel this question should be reported.

Please briefly explain why you feel this answer should be reported.

Please briefly explain why you feel this user should be reported.

Sign InSign Up

The Archive Base

The Archive Base Logo The Archive Base Logo

The Archive Base Navigation

  • SEARCH
  • Home
  • About Us
  • Blog
  • Contact Us
Search
Ask A Question

Mobile menu

Close
Ask a Question
  • Home
  • Add group
  • Groups page
  • Feed
  • User Profile
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Buy Points
  • Users
  • Help
  • Buy Theme
  • SEARCH
Home/ Questions/Q 525333
In Process

The Archive Base Latest Questions

Editorial Team
  • 0
Editorial Team
Asked: May 13, 20262026-05-13T08:39:38+00:00 2026-05-13T08:39:38+00:00

I have a file containing Unicode characters on a server running linux. If I

  • 0

I have a file containing Unicode characters on a server running linux. If I SSH into the server and use tab-completion to navigate to the file/folder containing unicode characters I have no problem accessing the file/folder. The problem arises when I try accessing the file via PHP (the function I was accessing the file system from was stat). If I output the path generated by the PHP script to the browser and paste it into the terminal the file also seems to exist (even though looking at the terminal the file paths are exactly the same).

I set PHP to use UTF8 as its default encoding via php_ini as well as set mb_internal_encoding. I checked the PHP filepath string encoding and it comes out as UTF8, as it should. Poking around a bit more I decided to hexdump the é character that the terminal’s tab-completion and compare it to the hexdump of the ‘regular’ é character created by the PHP script or by manually entering in the character via keyboard (option+e+e on os x). Here is the result:

echo -n é | hexdump
0000000 cc65 0081                              
0000003
echo -n é | hexdump
0000000 a9c3                                   
0000002

The é character that allows a correct file reference in the terminal is the 3-byte one. I’m not sure where to go from here, what encoding should I use in PHP? Should I be converting the path to another encoding via iconv or mb_convert_encoding?

  • 1 1 Answer
  • 0 Views
  • 0 Followers
  • 0
Share
  • Facebook
  • Report

Leave an answer
Cancel reply

You must login to add an answer.

Forgot Password?

Need An Account, Sign Up Here

1 Answer

  • Voted
  • Oldest
  • Recent
  • Random
  1. Editorial Team
    Editorial Team
    2026-05-13T08:39:38+00:00Added an answer on May 13, 2026 at 8:39 am

    Thanks to the tips given in the two answers I was able to poke around and find some methods for normalizing the different unicode decompositions of a given character. In the situation I was faced with I was accessing files created by a OS X Carbon application. It is a fairly popular application and thus its file names seemed to adhere to a specific unicode decomposition.

    In PHP 5.3 a [new set of functions][1] was introduced that allows you to normalize a unicode string to a particular decomposition. Apparently there are four decomposition standards which you can decompose you unicode string into. Python has had unicode normalization capabilties since version 2.3 via [unicode.normalize][2]. [This article][3] on python’s handling of unicode strings was helpful in understanding encoding / string handling a bit better.

    Here is a quick example on normalizing a unicode filepath:

    filePath = unicodedata.normalize('NFD', filePath)
    

    I found that the NFD format worked for all my purposes, I wonder if this is this is the standard decomposition for unicode filenames.
    [1]: https://www.php.net/manual/en/class.normalizer.php
    [2]: http://docs.python.org/library/unicodedata.html
    [3]: http://boodebr.org/main/python/all-about-python-and-unicode

    • 0
    • Reply
    • Share
      Share
      • Share on Facebook
      • Share on Twitter
      • Share on LinkedIn
      • Share on WhatsApp
      • Report

Sidebar

Related Questions

I have the following code reading a file containing unicode text (Japanese). File f
I have a file containing some lines of code followed by a string pattern.
I have a file containing some data (for example, 00927E2B112DB958......). This data is a
I have a file containing lots of data put in a form similar to
I have a file containing a list of filenames: esocket.c esocket.h dockwin.cpp dockwin.h makefile
I have a file containing the following content 1000 line in the following format:
I have a file containing data in a single column .. I have to
I have a file containing, roughly speaking, the state of the application. I want
I have a file containing records delimited by the pattern /#matchee/. These records are
I have a file containing the information 0001:Jack:Jeremy:6:38.0 0002:Mat:Steve:1:44.5 0003:Jessy:Rans:10:50.0 0004:Van Red:Jimmy:3:25.75 0005:Peter:John:8:42.25 0006:Mat:Jeff:3:62.0

Explore

  • Home
  • Add group
  • Groups page
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Users
  • Help
  • SEARCH

Footer

© 2021 The Archive Base. All Rights Reserved
With Love by The Archive Base

Insert/edit link

Enter the destination URL

Or link to existing content

    No search term specified. Showing recent items. Search or use up and down arrow keys to select an item.