Sign Up

Sign Up to our social questions and Answers Engine to ask questions, answer people’s questions, and connect with other people.

Have an account? Sign In

Have an account? Sign In Now

Sign In

Login to our social questions & Answers Engine to ask questions answer people’s questions & connect with other people.

Sign Up Here

Forgot Password?

Don't have account, Sign Up Here

Forgot Password

Lost your password? Please enter your email address. You will receive a link and will create a new password via email.

Have an account? Sign In Now

You must login to ask a question.

Forgot Password?

Need An Account, Sign Up Here

Please briefly explain why you feel this question should be reported.

Please briefly explain why you feel this answer should be reported.

Please briefly explain why you feel this user should be reported.

Sign InSign Up

The Archive Base

The Archive Base Logo The Archive Base Logo

The Archive Base Navigation

  • SEARCH
  • Home
  • About Us
  • Blog
  • Contact Us
Search
Ask A Question

Mobile menu

Close
Ask a Question
  • Home
  • Add group
  • Groups page
  • Feed
  • User Profile
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Buy Points
  • Users
  • Help
  • Buy Theme
  • SEARCH
Home/ Questions/Q 9009029
In Process

The Archive Base Latest Questions

Editorial Team
  • 0
Editorial Team
Asked: June 16, 20262026-06-16T02:05:32+00:00 2026-06-16T02:05:32+00:00

I seem to be having an issue converting a byte array (containing the text

  • 0

I seem to be having an issue converting a byte array (containing the text from a word document) to a LPTSTR (wchar_t *) object. Every time the code executes, I am getting a bunch of unwanted Unicode characters returned.

I figure it is because I am not making the proper calls somewhere, or not using the variables properly, but not quite sure how to approach this. Hopefully someone here can guide me in the right direction.

The first thing that happens in we call into C# code to open up Microsoft Word and convert the text in the document into a byte array.

byte document __gc[];
document = word->ConvertToArray(filename);

The contents of document are as follows:

{84, 101, 115, 116, 32, 68, 111, 99, 117, 109, 101, 110, 116, 13, 10}

Which ends up being the following string: “Test Document”.

Our next step is to allocate the memory to store the byte array into a LPTSTR variable,

byte __pin * value;

value = &document[0];

LPTSTR image;
image = (LPTSTR)malloc( document->Length + 1 );

Once we execute the line where we start allocating the memory, our image variable gets filled with a bunch of unwanted Unicode characters:

췍췍췍췍췍췍췍췍﷽﷽����˿於潁

And then we do a memcpy to transfer over all of the data

memcpy(image,value,document->Length);

Which just causes more unwanted Unicode characters to appear:

敔瑳䐠捯浵湥൴촊﷽﷽����˿於潁

I figure the issue that we are having is either related to how we are storing the values in the byte array, or possibly when we are copying the data from the byte array to the LPTSTR variable. Any help with explaining what I’m doing wrong, or anything to point me in the right direction will be greatly appreciated.

  • 1 1 Answer
  • 0 Views
  • 0 Followers
  • 0
Share
  • Facebook
  • Report

Leave an answer
Cancel reply

You must login to add an answer.

Forgot Password?

Need An Account, Sign Up Here

1 Answer

  • Voted
  • Oldest
  • Recent
  • Random
  1. Editorial Team
    Editorial Team
    2026-06-16T02:05:33+00:00Added an answer on June 16, 2026 at 2:05 am

    First you should learn something about text data and how it’s represented. A reference that will get you started there is The Absolute Minimum Every Software Developer Absolutely, Positively Must Know About Unicode and Character Sets (No Excuses!)

    byte is just a typedef or something for char or unsigned char. So the byte array is using some char encoding for the string. You need to actually convert from that encoding, whatever it is, into UTF-16 for Windows’ wchar_t. Here’s the typical method recommended for doing such conversions on Windows:

    int output_size = MultiByteToWideChar(CP_ACP,0,value,-1,NULL,0);
    assert(0<output_size);
    wchar_t *converted_buf = new wchar_t[output_size];
    int size = MultiByteToWideChar(CP_ACP,0,value,-1,converted_buf,output_size);
    assert(output_size==size);
    

    We call the function MultiByteToWideChar() twice, once to figure out how large of a buffer is needed to hold the result of the conversion, and a second time, passing in the buffer we allocated, to do the actual conversion.

    CP_ACP specifies the source encoding, and you’ll need to check the API documentation to figure out what that value really should be. CP_ACP stands for ‘codepage: Ansi codepage’, which is Microsoft’s way of saying ‘the encoding set for "non-Unicode" programs.’ The API may be using something else, like CP_UTF8 (we can hope) or 1252 or something.

    You can view the rest of the documentation on MultiByteToWideChar here to figure out the other arguments.


    Once we execute the line where we start allocating the memory, our image variable gets filled with a bunch of unwanted Unicode characters:

    When you call malloc() the memory given to you is uninitialized and just contains garbage. The values you see before initializing it don’t matter and you simply shouldn’t use that data. The only data that matters is what you fill the buffer with. The MultiByteToWideChar() code above will also automatically null terminate the string so you won’t see garbage in unused buffer space (and the method we use of allocating the buffer will not leave any extra space).


    The above code is not actually very good C++ style. It’s just typical usage of the C-style API provided by Win32. The way I prefer to do conversions (if I’m forced to) is more like:

    std::wstring_convert<std::codecvt_utf8_utf16<wchar_t>,wchar_t> convert; // converter object saved somewhere
    
    std::wstring output = convert.from_bytes(value);
    

    (Assuming the char encoding being used is UTF-8. You’ll have to use a different codecvt facet for any other encoding.)

    • 0
    • Reply
    • Share
      Share
      • Share on Facebook
      • Share on Twitter
      • Share on LinkedIn
      • Share on WhatsApp
      • Report

Sidebar

Related Questions

I seem to be having an odd issue whereby every time I try to
Might seem like a stupid question, but I'm having an issue creating an array
Seem to be having an issue with std::auto_ptr and assignment, such that the object
I seem to be having an issue replacing a text link with a link
I am having an issue that I can't seem to figure out. Hopefully somebody
I am having an xslt issue that I cannot seem to solve. Right now
I am having a strange issue that I can't seem to resolve. Here is
I seem to be having an issue with Jax-WS and Jax-b playing nicely together.
I seem to be having an issue with Xalan's translate method. I have the
I seem to be having a strange issue with a 2D game I'm working

Explore

  • Home
  • Add group
  • Groups page
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Users
  • Help
  • SEARCH

Footer

© 2021 The Archive Base. All Rights Reserved
With Love by The Archive Base

Insert/edit link

Enter the destination URL

Or link to existing content

    No search term specified. Showing recent items. Search or use up and down arrow keys to select an item.