After loosing much sleep I still cannot figure this out:
The code below (its a simplification from larger code that shows only the problem) Identifies Item1 and Item2 on FF but does not on IE7. I’m clueless.
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html xmlns="http://www.w3.org/1999/xhtml">
<head>
</head>
<body>
<table><tr>
<td><img src=imgs/site/trash.jpg border=1></td><td><font style="">Item1</font></td>
<td><img src=imgs/site/trash.jpg border=1></td><td><font style="">Item2</font></td>
</tr></table>
<script type="text/javascript">
var _pattern =/trash.*?<font.*?>(.*)<\/font>/gim;
alert (_pattern);
var thtml = document.documentElement.innerHTML;
alert (thtml);
while ( _match =_pattern.exec(thtml)){
alert (_match[1]);
}
</script>
</body>
</html>
Notes: 1. I know there are better ways to get Item1 and Item2. this example is for showing the Regex problem I’m facing in the simplest way.
2. When I remove the table and /table tags it works.
Thanks in advance
The problem is that JScripts multiline implementation is buggy. It doesn’t allow the any char . to match a newline character.
Use this regex instead:-
This eliminates . altogether, note [\s\S] is equivalent but will match a new line.
The reason why removing table changes things is the IE’s .innerHTML implementation doesn’t rely on original markup received. Instead the markup is created dynamically by examining the DOM. When it sees a table element it places newlines in the output in different places to than when table is missing.