I’m parsing an XML file which can contain localized strings in different languages (at

Question

0

Asked: May 22, 20262026-05-22T23:22:17+00:00 2026-05-22T23:22:17+00:00

I’m parsing an XML file which can contain localized strings in different languages (at

0

I’m parsing an XML file which can contain localized strings in different languages (at the moment its just english and spanish, but in the future it could be any language), the API for the XML parser returns all data within the XML via a char* which is UTF8 encoded.

Some manipulation of the data is required after its been parsed (searching within it for substrings, concatenating strings, determining the length of substrings etc.).

It would be convenient to use standard functions such as strlen, strcat etc. As the raw data I’m receiving from the XML parser is a char* I can do all manipulation readily using these standard string handling functions.

However these all of course make the assumption and requirement that the strings are NULL terminated.
My question therefore is – if you have wide data represented as a char*, can a NULL terminator character occur within the data rather than at the end?

i.e. if a character in a certain language doesn’t require 2 bytes to represent it, and it is represented in one byte, will/can the other byte be NULL?

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-05-22T23:22:18+00:00

Editorial Team

2026-05-22T23:22:18+00:00Added an answer on May 22, 2026 at 11:22 pm

UTF-8 is not “wide”. UTF-8 is multibyte encoding, where Unicode character can take 1 to 4 bytes. UTF-8 won’t have zero terminators inside valid character. Make sure you are not confused on what your parser is giving you. It could be UTF-16 or UCS2 or their 4-byte equivalents placed in wide character strings, in which case you have to treat them as wide strings.

0

Reply
Share
Share

- Report

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

I’m parsing an XML file which can contain localized strings in different languages (at

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply