Sign Up

Sign Up to our social questions and Answers Engine to ask questions, answer people’s questions, and connect with other people.

Have an account? Sign In

Have an account? Sign In Now

Sign In

Login to our social questions & Answers Engine to ask questions answer people’s questions & connect with other people.

Sign Up Here

Forgot Password?

Don't have account, Sign Up Here

Forgot Password

Lost your password? Please enter your email address. You will receive a link and will create a new password via email.

Have an account? Sign In Now

You must login to ask a question.

Forgot Password?

Need An Account, Sign Up Here

Please briefly explain why you feel this question should be reported.

Please briefly explain why you feel this answer should be reported.

Please briefly explain why you feel this user should be reported.

Sign InSign Up

The Archive Base

The Archive Base Logo The Archive Base Logo

The Archive Base Navigation

  • SEARCH
  • Home
  • About Us
  • Blog
  • Contact Us
Search
Ask A Question

Mobile menu

Close
Ask a Question
  • Home
  • Add group
  • Groups page
  • Feed
  • User Profile
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Buy Points
  • Users
  • Help
  • Buy Theme
  • SEARCH
Home/ Questions/Q 9314389
In Process

The Archive Base Latest Questions

Editorial Team
  • 0
Editorial Team
Asked: June 19, 20262026-06-19T02:07:17+00:00 2026-06-19T02:07:17+00:00

How can I access the underlying unicode data of MATLAB strings through the MATLAB

  • 0

How can I access the underlying unicode data of MATLAB strings through the MATLAB Engine or MEX C interfaces?

Here’s an example. Let’s put unicode characters in a UTF-8 encoded file test.txt, then read it as

fid=fopen('test.txt','r','l','UTF-8');
s=fscanf(fid, '%s')

in MATLAB.

Now if I first do feature('DefaultCharacterSet', 'UTF-8'), then from C engEvalString(ep, "s"), then as output I get back the text from the file as UTF-8. This proves that MATLAB stores it as unicode internally. However if I do mxArrayToString(engGetVariable(ep, "s")), I get what unicode2native(s, 'Latin-1') would give me in MATLAB: all non-Latin-1 characters replaced by character code 26. What I need is getting access to the underlying unicode data as a C string in any unicode format (UTF-8, UTF-16, etc.), and preserving the non-Latin-1 characters. Is this possible?

My platform is OS X, MATLAB R2012b.

Addendum: The documentation explicitly states that “[mxArrayToString()] supports multibyte encoded characters”, yet it still gives me only a Latin-1 approximation to the original data.

  • 1 1 Answer
  • 0 Views
  • 0 Followers
  • 0
Share
  • Facebook
  • Report

Leave an answer
Cancel reply

You must login to add an answer.

Forgot Password?

Need An Account, Sign Up Here

1 Answer

  • Voted
  • Oldest
  • Recent
  • Random
  1. Editorial Team
    Editorial Team
    2026-06-19T02:07:19+00:00Added an answer on June 19, 2026 at 2:07 am

    First, let me share a few references I found online:

    • According to mxChar description,

      MATLAB stores characters as 2-byte Unicode characters on machines with
      multi-byte character sets

      Still the term MBCS is somewhat ambiguous to me, I think they meant UTF-16 in this context (although I’m not sure about surrogate pairs, which probably makes it UCS-2 instead).

      UPDATE: MathWorks changed the wording to:

      MATLAB uses 16-bit unsigned integer character encoding for Unicode characters.

    • The mxArrayToString page states that it does handle multibyte encoded characters (unlinke mxGetString which only handles single-byte encoding schemes). Unfortunately, no example on how to do this.

    • Finally, here is a thread on the MATLAB newsgroup which mentions a couple of undocumented function that are related to this (you can find those yourself by loading the libmx.dll library into a tool like Dependency Walker on Windows).


    Here’s a small experiment I did in MEX:

    my_func.c

    #include "mex.h"
    
    void mexFunction(int nlhs, mxArray *plhs[], int nrhs, const mxArray *prhs[])
    {
        char str_ascii[] = {0x41, 0x6D, 0x72, 0x6F, 0x00};   // {'A','m','r','o',0}
        char str_utf8[] = {
            0x41,                   // U+0041
            0xC3, 0x80,             // U+00C0
            0xE6, 0xB0, 0xB4,       // U+6C34
            0x00
        };
        char str_utf16_le[] = {
            0x41, 0x00,             // U+0041
            0xC0, 0x00,             // U+00C0
            0x34, 0x6C,             // U+6C34
            0x00, 0x00
        };
    
        plhs[0] = mxCreateString(str_ascii);
        plhs[1] = mxCreateString_UTF8(str_utf8);        // undocumented!
        plhs[2] = mxCreateString_UTF16(str_utf16_le);   // undocumented!
    }
    

    I create three strings in C code encoded with ASCII, UTF-8, and UTF-16LE respectively. I then pass them to MATLAB using the mxCreateString MEX function (and other undocumented versions of it).

    I got the byte sequences by consulting Fileformat.info website:
    A (U+0041), À (U+00C0), and 水 (U+6C34).

    Let’s test the above function inside MATLAB:

    %# call the MEX function
    [str_ascii, str_utf8, str_utf16_le] = my_func()
    
    %# MATLAB exposes the two strings in a decoded form (Unicode code points)
    double(str_utf8)       %# decimal form: [65, 192, 27700]
    assert(isequal(str_utf8, str_utf16_le))
    
    %# convert them to bytes (in HEX)
    b1 = unicode2native(str_utf8, 'UTF-8')
    b2 = unicode2native(str_utf16_le, 'UTF-16')
    cellstr(dec2hex(b1))'  %# {'41','C3','80','E6','B0','B4'}
    cellstr(dec2hex(b2))'  %# {'FF','FE','41','00','C0','00','34','6C'}
                           %# (note that first two bytes are BOM markers)
    
    %# show string
    view_unicode_string(str_utf8)
    

    unicode_string AÀ水

    I am making use of the embedded Java capability to view the strings:

    function view_unicode_string(str)
        %# create Swing JLabel
        jlabel = javaObjectEDT('javax.swing.JLabel', str);
        font = java.awt.Font('Arial Unicode MS', java.awt.Font.PLAIN, 72);
        jlabel.setFont(font);
        jlabel.setHorizontalAlignment(javax.swing.SwingConstants.CENTER);
    
        %# place Java component inside a MATLAB figure
        hfig = figure('Menubar','none');
        [~,jlabelHG] = javacomponent(jlabel, [], hfig);
        set(jlabelHG, 'Units','normalized', 'Position',[0 0 1 1])
    end
    

    Now let’s work in the reverse direction (accepting a string from MATLAB into C):

    my_func_reverse.c

    #include "mex.h"
    
    void print_hex(const unsigned char* s, size_t len)
    {
        size_t i;
        for(i=0; i<len; ++i) {
            mexPrintf("0x%02X ", s[i] & 0xFF);
        }
        mexPrintf("0x00\n");
    }
    
    void mexFunction(int nlhs, mxArray *plhs[], int nrhs, const mxArray *prhs[])
    {
        char *str;
        if (nrhs<1 || !mxIsChar(prhs[0])) {
            mexErrMsgIdAndTxt("mex:error", "Expecting a string");
        }
        str = mxArrayToString_UTF8(prhs[0]); // get UTF-8 encoded string from Unicode
        print_hex(str, strlen(str));         // print bytes
        plhs[0] = mxCreateString_UTF8(str);  // create Unicode string from UTF-8
        mxFree(str);
    }
    

    And we test this from inside MATLAB:

    >> s = char(hex2dec(['0041';'00C0';'6C34'])');   %# "\u0041\u00C0\u6C34"
    >> ss = my_func_reverse(s);
    0x41 0xC3 0x80 0xE6 0xB0 0xB4 0x00               %# UTF-8 encoding
    >> assert(isequal(s,ss))
    

    Finally I should say that if for some reason you are still having problems,
    the easiest thing would be to convert the non-ASCII strings to uint8 datatype
    before passing this from MATLAB to your engine program.

    So inside the MATLAB process do:

    %# read contents of a UTF-8 file
    fid = fopen('test.txt', 'rb', 'native', 'UTF-8');
    str = fread(fid, '*char')';
    fclose(fid);
    str_bytes = unicode2native(str,'UTF-8');  %# convert to bytes
    
    %# or simply read the file contents as bytes to begin with
    %fid = fopen('test.txt', 'rb');
    %str_bytes = fread(fid, '*uint8')';
    %fclose(fid);
    

    and access the variable using the Engine API as:

    mxArray *arr = engGetVariable(ep, "str_bytes");
    uint8_T *bytes = (uint8_T*) mxGetData(arr);
    // now you decode this utf-8 string on your end ...
    

    All tests were done on WinXP running R2012b with the default charset:

    >> feature('DefaultCharacterSet')
    ans =
    windows-1252
    

    Hope this helps..


    EDIT:

    In MATLAB R2014a, many undocumented C functions were removed from libmx library (including the ones used above), and replaced with equivalent C++ functions exposed under the namespace matrix::detail::noninlined::mx_array_api.

    It should be easy to adjust the examples above (as explained here) to run on the latest R2014a version.

    • 0
    • Reply
    • Share
      Share
      • Share on Facebook
      • Share on Twitter
      • Share on LinkedIn
      • Share on WhatsApp
      • Report

Sidebar

Related Questions

Can anyone let me know how can access an element of a list that
As you know anyone can access strings in an native application using a hex
Is this possible? Can we access the underlying socket, used by http.sys to serve
I have an access report which fetches data from an underlying access query. The
How can i access underlying socket from twisted.web.client.Agent? I need to enable TCP_NODELAY on
How can I access the underlying iterator that a multi_pass iterator is using?
I can access the tags associated with a particular photo via the Graph API,
I can access Spring beans in my Servlets using WebApplicationContext springContext = WebApplicationContextUtils.getWebApplicationContext(getServletContext()); in
I can access and show self.view and see the frame in the log but
We can access the valueStack (and other objects of ActionContext ) using OGNL but

Explore

  • Home
  • Add group
  • Groups page
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Users
  • Help
  • SEARCH

Footer

© 2021 The Archive Base. All Rights Reserved
With Love by The Archive Base

Insert/edit link

Enter the destination URL

Or link to existing content

    No search term specified. Showing recent items. Search or use up and down arrow keys to select an item.