Sign Up

Sign Up to our social questions and Answers Engine to ask questions, answer people’s questions, and connect with other people.

Have an account? Sign In

Have an account? Sign In Now

Sign In

Login to our social questions & Answers Engine to ask questions answer people’s questions & connect with other people.

Sign Up Here

Forgot Password?

Don't have account, Sign Up Here

Forgot Password

Lost your password? Please enter your email address. You will receive a link and will create a new password via email.

Have an account? Sign In Now

You must login to ask a question.

Forgot Password?

Need An Account, Sign Up Here

Please briefly explain why you feel this question should be reported.

Please briefly explain why you feel this answer should be reported.

Please briefly explain why you feel this user should be reported.

Sign InSign Up

The Archive Base

The Archive Base Logo The Archive Base Logo

The Archive Base Navigation

  • SEARCH
  • Home
  • About Us
  • Blog
  • Contact Us
Search
Ask A Question

Mobile menu

Close
Ask a Question
  • Home
  • Add group
  • Groups page
  • Feed
  • User Profile
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Buy Points
  • Users
  • Help
  • Buy Theme
  • SEARCH
Home/ Questions/Q 7511665
In Process

The Archive Base Latest Questions

Editorial Team
  • 0
Editorial Team
Asked: May 29, 20262026-05-29T23:37:12+00:00 2026-05-29T23:37:12+00:00

I have some the folowing requirements: …The document must be encoded in UTF-8 …

  • 0

I have some the folowing requirements:

…The document must be encoded in UTF-8… The Lastname field only allows (Extended) ASCII … City only allows ISOLatin1
…The message must be put on the (IBM Websphere) MessageQueue as a IBytesMessage

The XML document, for simplicities sake, looks like this:

<?xml version="1.0" encoding="utf-8"?>
<foo>
  <lastname>John ÐØë</lastname>
  <city>John ÐØë</city>
  <other>UTF-8 string</other>
</foo>

The “ÐØë” part are (or should be) ASCII values 208, 216, 235 respectively.

I also have an object:

public class foo {
  public string lastname { get; set; }
}

So I instantiate an object and set the lastname:

var x = new foo() { lastname = "John ÐØë", city = "John ÐØë" };

Now this is where my headache sets in (or the inception if you will…):

  • Visual studio / source code is in Unicode
  • Hence: Object has an Unicode lastname
  • The XML Serializer uses UTF-8 to encode the document
  • Lastname should contain only (Extended) ASCII characters; the characters are valid ASCII chars but ofcourse in UTF-8 encoded form

I normally don’t experience any trouble with my encodings; I am familiar with The Absolute Minimum Every Software Developer Absolutely, Positively Must Know About Unicode and Character Sets (No Excuses!) but this one’s got me stumped…

I understand that the UTF-8 document will be perfectly able to “contain” both encodings because the codepoints ‘overlap’. But where I get lost is when I need to convert the serialized message to a byte-array. When doing a dump I see C3 XX C3 XX C3 XX (I don’t have the actual dump at hand). It’s clear (or I’ve been staring at this for too long) that the lastname / city strings are put in the serialized document in their unicode form; the byte-array suggests so.

Now what will I have to do, and where, to ensure the Lastname string goes into the XML document and finally the byte-array as an ASCII string (and the actual 208, 216, 235 byte sequence), and that City makes it in there as ISOLatin1?

I know the requirements are backwards, but I can’t change those (3rd party). I always use UTF-8 for our internal projects so I have to support the unicode-utf8=>ASCII/ISOLatin1 conversion (ofcourse, only for chars that are in those sets).

My head hurts…

  • 1 1 Answer
  • 0 Views
  • 0 Followers
  • 0
Share
  • Facebook
  • Report

Leave an answer
Cancel reply

You must login to add an answer.

Forgot Password?

Need An Account, Sign Up Here

1 Answer

  • Voted
  • Oldest
  • Recent
  • Random
  1. Editorial Team
    Editorial Team
    2026-05-29T23:37:16+00:00Added an answer on May 29, 2026 at 11:37 pm

    Never mind how the XML document is encoded for transmission. The right way to do what you want to do—encode certain non-ASCII characters so they survive the trip unscathed—is to use XML character references to represent the characters that need to be so preserved. For instance, your

    ÐØë
    

    is represented using XML character references as

    &#x00D0;&#x00D8;&#x00EB;
    

    The receiving [conformant] XML processor will/should/must convert those numeric character references back to the characters they represent. Here’s some code that will do the trick:

    public static string ConvertToXmlCharacterReference( this string xml )
    {
      StringBuilder sb  = new StringBuilder( s.Length ) ;
      const char    SP  = '\u0020' ; // anything lower than SP is a control character
      const char    DEL = '\u007F' ; // anything above DEL isn't ASCII, per se.
    
      foreach( char ch in xml )
      {
        bool isPrintableAscii = ch >= SP && ch <= DEL ;
    
        if ( isPrintableAscii ) { sb.Append(ch)                             ; }
        else                    { sb.AppendFormat( "&#x{0:X4}" , (int) ch ) ; }
    
      }
    
      string instance = sb.ToString() ;
      return instance ;
    }
    

    You could also use a regular expression to make the replacement or write an XSLT that would do the same thing. But the task is so trivial, it doesn’t really warrant that sort of approach. The above code is probably faster and less memory intensive and…it’s easier to understand.

    You should note though that since you want to preserve two different encodings in the same document, your conversion routine will need to differentiate between the conversion from “extended ASCII” to an XML character reference and the conversion from “ISO Latin 1” to an XML character reference.

    In both cases, the character reference specifies a codepoint in the ISO/IEC 10646 character set — essentially unicode. You’ll want to map the characters to the appropriate code point. Since string in the CLR world are UTF-16 encoded, that shouldn’t be much of an issue. The above code should work fine, I believe, unless you’ve get something really oddball that doesn’t play very nicely with UTF-16.

    • 0
    • Reply
    • Share
      Share
      • Share on Facebook
      • Share on Twitter
      • Share on LinkedIn
      • Share on WhatsApp
      • Report

Sidebar

Related Questions

Following on from this initial investigations on Silverlight architectures, I have some new requirements
I have some requirements to protect some sensitive data. The data is downloaded as
I have the following requirement: Based on some user input, I need to generate
I have been following some MVC tutorials that connect to a sql mdf database
I have some JS code which generates the following object, return { type: some
I have some markup similar to the following: <select> <option selected=selected>Apple</option> <option selected=>Orange</option> </select>
I have some data formated like the following 2009.07.02 02:20:14 40.3727 28.2330 6.4 2.6
I have some data similar to the following chart: http://developer.yahoo.com/yui/examples/charts/charts-seriescustomization_clean.html Only difference is that
I have some JavaScript code that works in IE containing the following: myElement.innerText =
I have the following problem with Tikz/Latex: I have some nodes that contain text.

Explore

  • Home
  • Add group
  • Groups page
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Users
  • Help
  • SEARCH

Footer

© 2021 The Archive Base. All Rights Reserved
With Love by The Archive Base

Insert/edit link

Enter the destination URL

Or link to existing content

    No search term specified. Showing recent items. Search or use up and down arrow keys to select an item.