Sign Up

Sign Up to our social questions and Answers Engine to ask questions, answer people’s questions, and connect with other people.

Have an account? Sign In

Have an account? Sign In Now

Sign In

Login to our social questions & Answers Engine to ask questions answer people’s questions & connect with other people.

Sign Up Here

Forgot Password?

Don't have account, Sign Up Here

Forgot Password

Lost your password? Please enter your email address. You will receive a link and will create a new password via email.

Have an account? Sign In Now

You must login to ask a question.

Forgot Password?

Need An Account, Sign Up Here

Please briefly explain why you feel this question should be reported.

Please briefly explain why you feel this answer should be reported.

Please briefly explain why you feel this user should be reported.

Sign InSign Up

The Archive Base

The Archive Base Logo The Archive Base Logo

The Archive Base Navigation

  • Home
  • SEARCH
  • About Us
  • Blog
  • Contact Us
Search
Ask A Question

Mobile menu

Close
Ask a Question
  • Home
  • Add group
  • Groups page
  • Feed
  • User Profile
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Buy Points
  • Users
  • Help
  • Buy Theme
  • SEARCH
Home/ Questions/Q 7308255
In Process

The Archive Base Latest Questions

Editorial Team
  • 0
Editorial Team
Asked: May 28, 20262026-05-28T23:31:56+00:00 2026-05-28T23:31:56+00:00

I’ve written a program (in C#) that reads and manipulates MSIL programs that have

  • 0

I’ve written a program (in C#) that reads and manipulates MSIL programs that have been generated from C# programs. I had mistakenly assumed that the syntax rules for MSIL string constants are the same as for C#, but then I ran into the following situation:

This C# statement

string s = "Do you wish to send anyway?";

gets compiled into (among other MSIL statements) this

IL_0128:  ldstr      "Do you wish to send anyway\?"

I wasn’t expecting the backslash that is used to escape the question mark. Now I can obviously take this backslash into account as part of my processing, but mostly out of curiosity I’d like to know if there is a list somewhere of which characters get escaped when the C# compiler converts C# constant strings to MSIL constant strings.

Thanks.

  • 1 1 Answer
  • 0 Views
  • 0 Followers
  • 0
Share
  • Facebook
  • Report

Leave an answer
Cancel reply

You must login to add an answer.

Forgot Password?

Need An Account, Sign Up Here

1 Answer

  • Voted
  • Oldest
  • Recent
  • Random
  1. Editorial Team
    Editorial Team
    2026-05-28T23:31:57+00:00Added an answer on May 28, 2026 at 11:31 pm

    Update

    Based on experimentation using the C# compiler + ildasm.exe: perhaps the reason there is no list of escaped characters is because there are so few: precisely 6.

    Going from the IL generated by ildasm, from C# programs compiled by Visual Studio 2010:

    • IL is strictly ASCII.
    • Three traditional whitespace characters are escaped
      • \t : 0x09 : (tab)
      • \n : 0x0A : (newline)
      • \r : 0x0D : (carriage return)
    • Three punctuation characters are escaped:
      • \" : 0x22 : (double quote)
      • \? : 0x3F : (question mark)
      • \\ : 0x5C : (backslash)
    • Only the following characters are included intact in literal strings 0x20 – 0x7E, (not including the three punctuation characters)
    • All other characters, including the ASCII contrl characters below 0x20 and everything from 0x7F on up, are converted to byte arrays. Or rather, any string containing any character other than the 92 literal and 6 escaped characters above, is converted to a byte array, where the bytes are the little-endian bytes of a UTF-16 string.

    Example 1: ASCII above 0x7E: A simple accented é (U+00E9)

    C#: Either "é" or "\u00E9" becomes (E9 byte comes first)

    ldstr      bytearray (E9 00 )
    

    Example 2: UTF-16: Summation symbol ∑ (U+2211)

    C#: Either "∑" or "\u2211" becomes (11 byte comes first)

    ldstr      bytearray (11 22 )
    

    Example 3: UTF-32: Double-struck mathematical (U+1D538)

    C#: Either "" or UTF-16 surrogate pair "\uD835\uDD38" becomes (bytes within char reversed, but double-byte chars in overall order)

    ldstr      bytearray (35 D8 38 DD )
    

    Example 4: Byte array conversion is for an entire string containing a non-Ascii character

    C#: "In the last decade, the German word \"über\" has come to be used frequently in colloquial English." becomes

    ldstr      bytearray (49 00 6E 00 20 00 74 00 68 00 65 00 20 00 6C 00  
                          61 00 73 00 74 00 20 00 64 00 65 00 63 00 61 00  
                          64 00 65 00 2C 00 20 00 74 00 68 00 65 00 20 00  
                          47 00 65 00 72 00 6D 00 61 00 6E 00 20 00 77 00  
                          6F 00 72 00 64 00 20 00 22 00 FC 00 62 00 65 00  
                          72 00 22 00 20 00 68 00 61 00 73 00 20 00 63 00  
                          6F 00 6D 00 65 00 20 00 74 00 6F 00 20 00 62 00  
                          65 00 20 00 75 00 73 00 65 00 64 00 20 00 66 00  
                          72 00 65 00 71 00 75 00 65 00 6E 00 74 00 6C 00  
                          79 00 20 00 69 00 6E 00 20 00 63 00 6F 00 6C 00  
                          6C 00 6F 00 71 00 75 00 69 00 61 00 6C 00 20 00  
                          45 00 6E 00 67 00 6C 00 69 00 73 00 68 00 2E 00 )
    

    Directly, “you can’t” (find a list of MSIL string escapes), but here are some helpful tidbits…

    ECMA-335, which contains the strict definition of CIL, does not specify which characters must be escaped in QSTRING literals, only that they may be escaped using the backslash \ character. The most important notes are:

    • Unicode literals are presented as octals, not hexadecimal (i.e. \042, not \u0022).
    • Strings can be spread accross multiple lines using the \ character–see below

    The only explicitly mentioned escapes are tab \t, linefeed \n, and octal numeric escapes. This is a bit annoying for you purposes since C# does not have an octal literal — you’ll have to do your own extraction and conversion, such as by using the Convert.ToInt32([string], 8) method.

    Beyond that the choice of escapes is “implementation-specific” to the “hypothetical IL assembler” described in the spec. So your question rightly asks about the rules for MSIL, which is Microsoft’s strict implementation of CIL. As far as I can tell, MS has not documented their choice of escapes. It could be helpful at least to ask the Mono folks what they use. Beyond that, it may be a matter of generating the list yourself — make a program that declares a string literal for every character \u0000 – whatever, and see what the compiled ldstr statements are. If I get to it first, I’ll be sure to post my results.

    Additional notes:

    To properly parse *IL string literals — known as QSTRINGS or SQSTRINGS — you will have to account for more than just character escapes. Take in-code string concatenation, for example (and this is verbatim from Partition II::5.2):

    The “+” operator can be used to concatenate string literals. This way, a long string can be broken across multiple lines by using “+” and a new string on each line. An alternative is to use “\” as the last character in a line, in which case, that character and the line break following it are not entered into the generated string. Any white space characters (space, line-feed, carriage-return, and tab) between the “\” and the first non-white space character on the next line are ignored. [Note: To include a double quote character in a QSTRING, use an octal escape sequence. end note]

    Example: The following result in strings that are equivalent to “Hello World from CIL!”:

    ldstr "Hello " + "World " + "from CIL!"
    
    ldstr "Hello World\ 
           \040from CIL!"
    
    • 0
    • Reply
    • Share
      Share
      • Share on Facebook
      • Share on Twitter
      • Share on LinkedIn
      • Share on WhatsApp
      • Report

Sidebar

Related Questions

I have a jquery bug and I've been looking for hours now, I can't
I have a French site that I want to parse, but am running into
I'm parsing an RSS feed that has an ’ in it. SimpleXML turns this
I have a text area in my form which accepts all possible characters from
I have a string like this: La Torre Eiffel paragonata all’Everest What PHP function
I have a bunch of posts stored in text files formatted in yaml/textile (from
link Im having trouble converting the html entites into html characters, (&# 8217;) i
That's pretty much it. I'm using Nokogiri to scrape a web page what has
I have just tried to save a simple *.rtf file with some websites and
For some reason, after submitting a string like this Jack’s Spindle from a text

Explore

  • Home
  • Add group
  • Groups page
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Users
  • Help
  • SEARCH

Footer

© 2021 The Archive Base. All Rights Reserved
With Love by The Archive Base

Insert/edit link

Enter the destination URL

Or link to existing content

    No search term specified. Showing recent items. Search or use up and down arrow keys to select an item.