————-test.hta file code ————
<!DOCTYPE html>
<html>
<head>
<title>dead</title>
</head>
<body>
txt<textarea id="content" >
<input name="" type="text" class="qu_te1n05ew" value="请输入您的E-mail地址" />
<input name="" type="submit" class="qu_sbt02" value="提 交" />
</textarea>
<button onclick="startCls();">start</button>
<script>
function getObj(id) {
return 'string' == typeof id ? document.getElementById(id) : id;
}
function startCls() {
var txt = getObj('content').value;
var srcRe = /<\w+(?:\s[^<>]*(?:(?:'[^']*')|(?:"[^"]*"))?[^<>]*)*\s+src\s*\=\s*["']?(?:[^"' <>]*\/)?([^\/"'<>]+\.(?:gif|jpg|png))['" ](?:\s[^<>]*(?:(?:'[^']*')|(?:"[^"]*"))?[^<>]*)*\/?>/ig;
alert(srcRe.exec(txt));
}
</script>
</body>
</html>
————code end——-
why srcRe.exec(txt) loop and the hta is dead?but other test string it will work.
the srcRe my mean is get a img tagname’s src,and split it to get filename,but don’t get no tagname’s src,like <b><img src="ss.gif" </b>,because it isn’t a html tagname.have not end >;
this synax (?:\s[^<>]*(?:(?:'[^']*')|(?:"[^"]*"))?[^<>]*)*,the mean is if have a < or > ,it must be in the '' or "",and other string must be not < or >;and is start by <,end by >;
I’m not going to debug this ghastly regex. But I can tell you why it fails. Breaking it down for “readability”:
You can see that this can only match if there is a
.gifor.jpgor.pngin your string. Which it isn’t, so the regex has to fail.The problem now is that the regex engine takes a long time to figure this out because there are several instances of
[^<>]*in your string, all of which can (and will try to) match the entire tag’s contents, and (to add insult to injury) all of which are even enclosed in repeating groups. See line 3, broken down:There are gazillions of permutations that the regex engine all has to check before being able to declare failure. In short, it’s not an infinite loop, but a regex like this with input like that will like keep your computer busy until hell freezes over.
Hint 1: Read this tutorial on catastrophic backtracking.
Hint 2: Don’t use regexes to parse HTML. At least not if you don’t know exactly what you’re doing.