I’m parsing some html using regex and I want to match lines which start with a word without any html tags while also removing the white space. Using c# regex my first pattern was:
pattern = @'^\s*([^<])';
which attempts to grab all the white space and then capture any non ‘<‘ characters. Unfortunately if the line is all white space before the first ‘<‘ this returns the last white space character before the ‘<‘. I would like this to fail the match.
Any ideas?
Asked the question to soon, just worked out this:
pattern = @’^\s*((?!\s)[^<]+)’;
Thanks for the feedback about regex and html, I’ll bare it in mind for the future. I’m writing a utility program to make a few pages multi-language (i.e: add asp:literals for hardcoded text etc), I think regex is sufficient for this purpose but if there are better tools please let me know (web stuff isn’t my area…).