Sign Up

Sign Up to our social questions and Answers Engine to ask questions, answer people’s questions, and connect with other people.

Have an account? Sign In

Have an account? Sign In Now

Sign In

Login to our social questions & Answers Engine to ask questions answer people’s questions & connect with other people.

Sign Up Here

Forgot Password?

Don't have account, Sign Up Here

Forgot Password

Lost your password? Please enter your email address. You will receive a link and will create a new password via email.

Have an account? Sign In Now

You must login to ask a question.

Forgot Password?

Need An Account, Sign Up Here

Please briefly explain why you feel this question should be reported.

Please briefly explain why you feel this answer should be reported.

Please briefly explain why you feel this user should be reported.

Sign InSign Up

The Archive Base

The Archive Base Logo The Archive Base Logo

The Archive Base Navigation

  • SEARCH
  • Home
  • About Us
  • Blog
  • Contact Us
Search
Ask A Question

Mobile menu

Close
Ask a Question
  • Home
  • Add group
  • Groups page
  • Feed
  • User Profile
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Buy Points
  • Users
  • Help
  • Buy Theme
  • SEARCH
Home/ Questions/Q 9201411
In Process

The Archive Base Latest Questions

Editorial Team
  • 0
Editorial Team
Asked: June 17, 20262026-06-17T23:00:00+00:00 2026-06-17T23:00:00+00:00

I need to debug a XML parser and I am wondering if I can

  • 0

I need to debug a XML parser and I am wondering if I can construct “malicious” input that will cause it to not recognize opening and closing tags correctly.

Additionally, where can I find this sort of information in general? After this I will also want to be sure that the parser I am working with won’t have trouble with other special characters such as &, = , ", etc.

  • 1 1 Answer
  • 0 Views
  • 0 Followers
  • 0
Share
  • Facebook
  • Report

Leave an answer
Cancel reply

You must login to add an answer.

Forgot Password?

Need An Account, Sign Up Here

1 Answer

  • Voted
  • Oldest
  • Recent
  • Random
  1. Editorial Team
    Editorial Team
    2026-06-17T23:00:02+00:00Added an answer on June 17, 2026 at 11:00 pm

    UTF-8 makes it very easy to figure out what the role of a code unit (i.e. a byte) is:

    • If the highest bit is not set, i.e. the code unit is 0xxxxxxx, then this is byte expresses an entire code point, whose value is xxxxxxx (i.e. 7 bits of information).

    • If the highest bit is set and the code unit is 10xxxxxx, then it is a continuation part of a multibyte sequence, carrying six bits of information.

    • Otherwise, the code unit is the initial byte of a multibyte sequence, as follows:

      • 110xxxxx: Two bytes (one continuation byte), for 5 + 6 = 11 bits.
      • 1110xxxx: Three bytes (two continuation bytes), for 4 + 6 + 6 = 16 bits.
      • 11110xxx: Four bytes (three continuation bytes), for 3 + 6 + 6 + 6 = 21 bits.

    As you can see, a value 60, which is 00111100, is a single-byte codepoint of value 60, and the same byte cannot occur as part of any multibyte sequence.

    The scheme can actually be extended up to seven bytes, encoding up to 36 bits, but since Unicode only requires 21 bits, four bytes suffice. The standard mandates that a code point must be represented with the minimal number of code units.

    Update: As @Mark Tolonen rightly points out, you should check carefully whether each encoded code point is actually encoded with the minimal number of code units. If a browser would inadvertently accept such input, a user could sneak something past you that you would not spot in a byte-for-byte analysis. As a starting point you could look for bytes like 10111100, but you’d have to check the entire multibyte sequence of which it is a part (since it can of course occur legitimately as a part of different code points). Ultimately, if you can’t trust the browser, you don’t really get around decoding everything and just check­ing the resulting code point sequence for occurrences of U+3C etc., and don’t even bother looking at the byte stream.

    • 0
    • Reply
    • Share
      Share
      • Share on Facebook
      • Share on Twitter
      • Share on LinkedIn
      • Share on WhatsApp
      • Report

Sidebar

Related Questions

I need to debug a class library project that is provided to the main
I have a very specific case that I need to debug. I need to
I have a process that spawns a helper process. Sometimes I need to debug
We're having problems with UTF-8 in Solr, and need to debug the documents that
I have many, (15-20) different XML files that I need to load to VB.Net.
I need to debug remote java code with IntelliJ Idea 10.5 (Ultimate Edition) When
I need to debug my application to look for content in variables and so
I need to debug some binary crash using IDA Pro. What is the command-line
I sometimes need to debug JS in other browsers, and it would be really
I have a java class and I need to debug it (put breakpoints and

Explore

  • Home
  • Add group
  • Groups page
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Users
  • Help
  • SEARCH

Footer

© 2021 The Archive Base. All Rights Reserved
With Love by The Archive Base

Insert/edit link

Enter the destination URL

Or link to existing content

    No search term specified. Showing recent items. Search or use up and down arrow keys to select an item.