I was going over a piece of code, and i came across this regular expression
Regex _fileOrImageRegex = new Regex("<\\s*(?<Tag>(applet|embed|frame|iframe|img|link|script|xml))\\s*.*?(?<AttributeName>(src|href|xhref))\\s*=\\s*([\\\"\\'])(?<FileOrImage>.*?)\\3", RegexOptions.Compiled | RegexOptions.IgnoreCase | RegexOptions.Singleline);
Can someone please explain me the expression in plain words. Its been used to parse all the images, i get that part, i also want to modify the regular expression to include the alt tag for every image tag it matches.
thanks
Required link: RegEx match open tags except XHTML self-contained tags
In English, what it does is this:
<matches a HTML open tag\s*matches any amount of whitespace (tabs, spaces, newlines)(?is something to not worry about – it’s a subgroup but it doesn’t store the valueThe next lump is possible values for open tags –
applet,embed, etcThe
()around the values mean “store this value in a subpattern, and make it available aspart of my results
The
|means “or”, soappletorembed, etc – this looks at tag names\s*more whitespace.?means “any amount of anything”, except for newlines but because of theSingleLineflag (see comments for this answer) is matches “any amount of anything”(?again, see above, same for the optional values (src, href) – these are the tagattributes
\s=\s*means “a space, followed by an equals sign, followed by any amount of whitespace”([\\"\\'])the(), see above. The[]mean “any of these characters, in any order”, and the\\"and\\'are the ” and ‘ characters, escaped with backslashes(?.?)we already know(?, and the.?means “optionally, a single one of any character”The options at the end are modifiers, they make the regex match more things – IgnoreCase makes it case insensitive, Singleline should be obvious, and someone else will tell you what Compiled means, because I don’t know the language the regex is written for 🙂
Edit: You’ve just updated the first post a little. The
<Tag>and<AttributeName>give the match groups a name, so for example, your result of running the regex might look like this:By the way, congratulations on having an awesome name 😀