I have a Html code and I want to Convert it to plain text but keep only colored text tags.
for example:
when I have below Html:
<body>
This is a <b>sample</b> html text.
<p align="center" style="color:#ff9999">this is only a sample<p>
....
and some other tags...
</body>
</html>
I want the output:
this is a sample html text.
<#ff9999>this is only a sample<>
....
and some other tags...
I’d use parser to parse HTML like HtmlAgilityPack, and use regular expressions to find the
colorvalue in attributes.First, find all the nodes that contain
styleattribute withcolordefined in it by using xpath:Then the simplest regex to match a color value:
(?<=color:\s*)#?\w+.Then iterate through these nodes and if there is a regex match, replace the inner html of the node with html encoded tags (you’ll understand why a little bit later):
And finally get the inner text of the document and perform html decoding on it (this is because inner text strips all the tags):
This should return something like this:
Of course you could improve it for your needs.