I am doing a sort of crude parsing of javascript code, with javascript. I’ll spare the details of why I need to do this, but suffice to say that I don’t want to integrate a huge chunk of library code, as it is unnecessary for my purposes and it is important that I keep this very lightweight and relatively simple. So please don’t suggest I use JsLint or anything like that. If the answer is more code than you can paste into your answer, it’s probably more than I want.
My code currently is able to do a good job of detecting quoted sections and comments, and then matching braces, brackets and parens (making sure not to be confused by the quotes and comments, or escapes within quotes, of course). This is all I need it to do, and it does it well…with one exception:
It can be confused by regular expression literals. So I’m hoping for some help with detecting regular expression literals in a string of javascript, so I can handle them appropriately.
Something like this:
function getRegExpLiterals (stringOfJavascriptCode) {
var output = [];
// todo!
return output;
}
var jsString = "var regexp1 = /abcd/g, regexp1 = /efg/;"
console.log (getRegExpLiterals (jsString));
// should print:
// [{startIndex: 13, length: 7}, {startIndex: 32, length: 5}]
es5-lexer is a JS lexer that uses a very accurate heuristic to distinguish regular expressions in JS code from division expressions, and also provides a token level transformation that you can use to make sure that the resulting program will be interpreted the same way by a full JS parser as by the lexer.
The bit that determines whether a
/starts a regular expression is inguess_is_regexp.jsand the tests start atscanner_test.jsline 401