I am trying to convert the webpage into a plain text. But if I

Question

0

Asked: May 23, 20262026-05-23T15:47:54+00:00 2026-05-23T15:47:54+00:00

I am trying to convert the webpage into a plain text. But if I

0

I am trying to convert the webpage into a plain text. But if I encountered the table I am getting td and tr tags too. If I replace those table tags then I can’t get some of the content.

Here is my code

string s = Regex.Replace(htmldoc, "<script.*?</script>", "", RegexOptions.Singleline | RegexOptions.IgnoreCase);
s = Regex.Replace(s, "<!--.*?-->", "", RegexOptions.Singleline | RegexOptions.IgnoreCase);
s = Regex.Replace(s, "<style.*?style>", "", RegexOptions.Singleline | RegexOptions.IgnoreCase);
s = Regex.Replace(s, "<a.*?a>", "", RegexOptions.Singleline | RegexOptions.IgnoreCase);
s = Regex.Replace(s, "<img.*?img>", "", RegexOptions.Singleline | RegexOptions.IgnoreCase);
s = Regex.Replace(s, "<table.*?table>", "", RegexOptions.Singleline | RegexOptions.IgnoreCase);
HtmlAgilityPack.HtmlDocument doc = new HtmlAgilityPack.HtmlDocument();
doc.LoadHtml(s);
s = doc.DocumentNode.SelectSingleNode("//body").InnerText.Trim();

Please check it and tell me how can I get the contents from table without getting td and tr tags.

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-05-23T15:47:55+00:00

Editorial Team

2026-05-23T15:47:55+00:00Added an answer on May 23, 2026 at 3:47 pm

If you are using HTML Agility pack to parse the table you don’t need to remove the HTML tags with your regex. There are some good examples of parsing tables using HTML Agility pack here on SO. ex: HTML Agility pack – parsing tables

0

Reply
Share
Share

- Report

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

I am trying to convert the webpage into a plain text. But if I

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply