Sign Up

Sign Up to our social questions and Answers Engine to ask questions, answer people’s questions, and connect with other people.

Have an account? Sign In

Have an account? Sign In Now

Sign In

Login to our social questions & Answers Engine to ask questions answer people’s questions & connect with other people.

Sign Up Here

Forgot Password?

Don't have account, Sign Up Here

Forgot Password

Lost your password? Please enter your email address. You will receive a link and will create a new password via email.

Have an account? Sign In Now

You must login to ask a question.

Forgot Password?

Need An Account, Sign Up Here

Please briefly explain why you feel this question should be reported.

Please briefly explain why you feel this answer should be reported.

Please briefly explain why you feel this user should be reported.

Sign InSign Up

The Archive Base

The Archive Base Logo The Archive Base Logo

The Archive Base Navigation

  • SEARCH
  • Home
  • About Us
  • Blog
  • Contact Us
Search
Ask A Question

Mobile menu

Close
Ask a Question
  • Home
  • Add group
  • Groups page
  • Feed
  • User Profile
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Buy Points
  • Users
  • Help
  • Buy Theme
  • SEARCH
Home/ Questions/Q 6147673
In Process

The Archive Base Latest Questions

Editorial Team
  • 0
Editorial Team
Asked: May 23, 20262026-05-23T19:08:41+00:00 2026-05-23T19:08:41+00:00

Yes, I really am going to ask about parsing XML with regexes… here goes.

  • 0

Yes, I really am going to ask about parsing XML with regexes… here goes.

I have some XML-ish data, and I need to parse it. I can’t do it completely with an XMLDocument or similar because it’s not proper XML, and I’m not sure I can (or want to) change the format. The main problem is tags which have special meaning, and look like this:

<$ something_here $>

C#’s XmlDocument falls over parsing that, and I assume other methods will too. I could, with a lot of work, change the above to something like

<some_special_tag><![CDATA[ something_here ]]></some_special_tag>

But that’s ugly, and I don’t really want to. The reason it would be time consuming to change is that I have hundreds, maybe thousands of XML documents which would need to be changed.

At the moment, I’m parsing the document with regexes. I only need to pick out a couple of specific tags (not the ones above), and it seems to be working, but I’m uncomfortable with it. I’m doing something like this at the moment:

...

MatchCollection mc = Regex.Matches(Template, "<tagname.*?/tagname>"); // or similar
foreach (Match m in mc) {

    try {

        XmlDocument xd = new XmlDocument();
        xd.LoadXml(m.Value);

...

This at least means I’m not using regexes exclusively 🙂

Can anyone think of a better way? Is there some way of getting XmlDocument to politely ignore the $ character that causes it to fall over? It doesn’t seem likely, but I thought I should at least get some opinions.

  • 1 1 Answer
  • 0 Views
  • 0 Followers
  • 0
Share
  • Facebook
  • Report

Leave an answer
Cancel reply

You must login to add an answer.

Forgot Password?

Need An Account, Sign Up Here

1 Answer

  • Voted
  • Oldest
  • Recent
  • Random
  1. Editorial Team
    Editorial Team
    2026-05-23T19:08:42+00:00Added an answer on May 23, 2026 at 7:08 pm

    No, there is no way to get XmlDocument to parse a document which isn’t xml, no matter how close to xml it might look!

    If its possible to do then I would definitely recommend that you convert your documents to be actual xml (or at least some recognised document format). Trying to create and maintain a reliable working parser for any format is quite a lot of work, let alone a format that doesn’t appear to be rigeriously defined.

    Using a some_special_tag element to identify special sections seems like a good idea to me. If necessary you can use a different namespace to ensure no clashes with other elements in your document – this is in fact exactly the way that xslt works (“special” tags are used to mean special things, like templates or nodes that should be replaced) and exactly what xml was designed to support.

    Also I don’t understand why you would need to place the something_here bit in CDATA sections. All characters that “break” xml can be escaped fairly easily (for example by writing < as &lt;). CDATA sections are generally only used when the contents of a node needs so much escaping that its easier and less messy to just to use CDATA sections instead.

    Update: Regarding migration to a new format, can you not use both methods? Attempt to parse the document as an XML document (or if there are performance concerns then perform some other test to quickly determine if the document is in the “old” or “new” format such as checking for a version attribute in the root element) – if it doesn’t work then fall back to the old method.

    This way as long as everything is working fine (which is will be as long as nothing changes) users don’t need to modify their documents, however if they run into problems or want to use any new features then explain to them that they must update their document to the new format.

    Depending on how well your current “parser” works, you may even be able to provide an upgrade utility that automatically performns the conversion (as best it can).

    • 0
    • Reply
    • Share
      Share
      • Share on Facebook
      • Share on Twitter
      • Share on LinkedIn
      • Share on WhatsApp
      • Report

Sidebar

Related Questions

Yes we're talking about ASCII codes. My appologies I'm not the Delphi dev here.
Sorry for the title.....bit difficult to word what I really want to ask. Some
I have some pretty standard flipping action going on: [UIView beginAnimations:@swapScreens context:nil]; [UIView setAnimationTransition:UIViewAnimationTransitionFlipFromLeft
So I've got something really weird going on here, and can't quite put my
I really don't know what's going on here, I've been racking my brain for
I have been trying to find some more information about the next Microsoft Dynamics
Yes I know, this title isn't really helpfull but this is the exact problem.
...Yes I've seen: Best Resources for Learning JavaFX? but it doesn't really answer the
Running IIS5 (yes, really). I'd like to remove the eTag http header that IIS
Yes, I know that the FAQ pretends to answer this, but it doesn't really.

Explore

  • Home
  • Add group
  • Groups page
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Users
  • Help
  • SEARCH

Footer

© 2021 The Archive Base. All Rights Reserved
With Love by The Archive Base

Insert/edit link

Enter the destination URL

Or link to existing content

    No search term specified. Showing recent items. Search or use up and down arrow keys to select an item.