On ASP.NET MVC 3, I created a Action Filter for white space removal from the entire html. It works as I expected most of the time but now I need to change the RegEx in order not to touch inside pre element.
I get the RegEx logic from awesome Mads Kristensen‘s blog and I am not sure how to modify it for this purpose.
Here is the logic:
public override void Write(byte[] buffer, int offset, int count) {
string HTML = Encoding.UTF8.GetString(buffer, offset, count);
Regex reg = new Regex(@"(?<=[^])\t{2,}|(?<=[>])\s{2,}(?=[<])|(?<=[>])\s{2,11}(?=[<])|(?=[\n])\s{2,}");
HTML = reg.Replace(HTML, string.Empty);
buffer = System.Text.Encoding.UTF8.GetBytes(HTML);
this.Base.Write(buffer, 0, buffer.Length);
}
Whole code of the filter:
Any idea?
EDIT:
BIG NOTE:
My intention is totally not speed up the response time. In fact,
maybe this slows things down. I GZiped the pages and this minification makes me
gain approx 4 – 5 kb per page which is nothing.
Parsing HTML with regex very complicated and any simple solutions could break easily. (Use the right tool for the job.) That being said I’ll show a simple solution.
First I simplified the regex you had to:
Replace those matches with an empty string to get rid of double spaces everywhere.
Assuming there are no
<or>inside thepretag, you can add(?![^<>]*</pre>)at the end of the expression to make it fail inside ofpretags. This makes sure that</pre>doesn’t follow current match, without any tags in between.Resulting in: