I need to remove all ‘\n’ between ‘<‘ and ‘>’ in html file with C#.
my code is below:
Regex.Replace(text, "(<[^<>)]*)\\n+([^><]*>$)", "\1\2");
But it obviously doesn’t work. Any suggestions?
Example:
< style="
">
detailed example:
<td colspan="3" rowspan="2">
<table cellpadding="0" cellspacing="0" class="a10" cols="13" id="t_5" lang="en-AU">
<tr id="t_5_FNHR">
<td class="a26" style="HEIGHT:5.00mm">
<div class="r11">LAKOTA - PINK PANTHER RETURNS-V</div>
</td>
<td class="a27" style="
">
<div class="r11">5c</div>
</td>
Another:
<td class="a34" style="
">
<div class="r11">7,390.62</div>
</td>
<td class="a35" style="
">
<div class="r11">617.81</div>
</td>
<td class="a36" style="
">
An easy but obviously brittle way would be to remove all linebreaks where the next angle bracket is a
>:Explanation:
Might be good enough for your case (if it isn’t, regex probably is not the right tool anyway).