Sign Up

Sign Up to our social questions and Answers Engine to ask questions, answer people’s questions, and connect with other people.

Have an account? Sign In

Have an account? Sign In Now

Sign In

Login to our social questions & Answers Engine to ask questions answer people’s questions & connect with other people.

Sign Up Here

Forgot Password?

Don't have account, Sign Up Here

Forgot Password

Lost your password? Please enter your email address. You will receive a link and will create a new password via email.

Have an account? Sign In Now

You must login to ask a question.

Forgot Password?

Need An Account, Sign Up Here

Please briefly explain why you feel this question should be reported.

Please briefly explain why you feel this answer should be reported.

Please briefly explain why you feel this user should be reported.

Sign InSign Up

The Archive Base

The Archive Base Logo The Archive Base Logo

The Archive Base Navigation

  • SEARCH
  • Home
  • About Us
  • Blog
  • Contact Us
Search
Ask A Question

Mobile menu

Close
Ask a Question
  • Home
  • Add group
  • Groups page
  • Feed
  • User Profile
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Buy Points
  • Users
  • Help
  • Buy Theme
  • SEARCH
Home/ Questions/Q 5951327
In Process

The Archive Base Latest Questions

Editorial Team
  • 0
Editorial Team
Asked: May 22, 20262026-05-22T17:31:14+00:00 2026-05-22T17:31:14+00:00

I have started learning C# recently. MSDN has an example where you make a

  • 0

I have started learning C# recently. MSDN has an example where you make a RSS application by directly getting the XML file, so I tried something of my own, and like most of the times, I didn’t got it right. Put the sigh sound here.

As the pages are HTML, I tried looking for HTML to XHTML converters, and I found this one (which is pretty interesting) called HTML-Cleaner.

It replaces unwanted tags with a <dd> tag, but I wish to skip those tags, so I made a modification of my own:

public override bool Read()
{
  bool status = base.Read();
  if( status )
  {
    if( base.NodeType == XmlNodeType.Element )
    {
      dowrite = false;
      // Got a node with prefix. This must be one of those "<o:p>"
      // or something else.  Skip this node entirely. We want prefix-
      // less nodes so that the resultant XML requires no namespace.
      foreach (string line in AllowedTags)
      {
        if (base.Name == line || 
           (base.Name == "html" && first == false))
        { 
            dowrite = true; 
            first = true; 
        }
      } 

      if( base.Name.IndexOf(':') > 0 )
        dowrite=false;

      if(!dowrite)
        base.Skip();
    }
  }
    return status;
}

The problem is it only prints one <dd> tag and nothing else. Even if allowed tags are present, it skips them.

Why is this happening? Any help will be greatly appreciated. If you have alternative approaches, please feel free to suggest them.


EDIT : anyway to achieve this???

  • 1 1 Answer
  • 0 Views
  • 0 Followers
  • 0
Share
  • Facebook
  • Report

Leave an answer
Cancel reply

You must login to add an answer.

Forgot Password?

Need An Account, Sign Up Here

1 Answer

  • Voted
  • Oldest
  • Recent
  • Random
  1. Editorial Team
    Editorial Team
    2026-05-22T17:31:15+00:00Added an answer on May 22, 2026 at 5:31 pm

    It looks like the Read method returns XML nodes, not tags, so the entire contents of any not matching node will be dropped.

    If the input is a typical HTML file, at some point during the recursive Read method the ‘head’ element will be found. This is not in the AllowedTags list so it, and all its descendent nodes will be Skipped.

    The same applies to the body element. It and all its descendents will be skipped.

    That leaves the html element, which matches in your code and so gets inserted into the XML DOM.

    Since html is not in the AllowedTags list, during the HTMLWriter phase, the html tags will get converted to dd tags, which is what you describe as your output.

    I actually don’t go a bundle on the html2xhtmlcleaner code, but I think you need to adapt the writer code rather than the reader code to achieve what you are trying to do.

    • 0
    • Reply
    • Share
      Share
      • Share on Facebook
      • Share on Twitter
      • Share on LinkedIn
      • Share on WhatsApp
      • Report

Sidebar

Related Questions

I have recently started learning C++, but I require a compiler. I have tried
I have recently started learning Silverlight and can't figure out how make this work.
I have recently started learning F#, and this is the first time I've ever
I have recently started learning Perl and one of my latest assignments involves searching
I've been a web developer for some time now, and have recently started learning
I have started learning Ruby recently and I was trying out the following piece
I have recently started learning C++ and coming from a Ruby environment I have
I have recently started learning Python and I have 2 questions relating to modules.
I have recently started learning Ruby, as my first programming language. I feel comfortable
I have recently started learning wpf and am trying to use mvvm. My understanding

Explore

  • Home
  • Add group
  • Groups page
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Users
  • Help
  • SEARCH

Footer

© 2021 The Archive Base. All Rights Reserved
With Love by The Archive Base

Insert/edit link

Enter the destination URL

Or link to existing content

    No search term specified. Showing recent items. Search or use up and down arrow keys to select an item.