As part of a Java webapp I’m working on I need to add a prefix to some URIs are loaded from a database, e.g.
"controller.jsp?page=list"
becomes…
<a href="${legacyBaseUrl}/controller.jsp?page=list">...</a>
Easy. A problem arises though where some of the URIs in the database contain Javascript, e.g.
"javascript:window.open('controller.jsp?page=popup')"
What I’d like to be able to do is…
<a href="javascript:window.open('${legacyBaseUrl}/controller.jsp?page=popup')">...</a>
or better yet…
<a href="${legacyBaseUrl}/controller.jsp?page=popup" target="_blank">...</a>
I know I can just chop it apart with regular expressions, but I’m wary of treating this as a simple string manipulation problem, as the data has never been sanitized and there could be any Javascript in the database.
Is there a (relatively) simple way to parse Javascript properly in Java, and recognize/extract calls to window.open or other JS function?
I’ve looked briefly at stuff like Rhino or javax.script, but am a bit lost. Is this the right thing for my needs? Would a regex actually be enough? Any suggestions?
It seems you need fully functional HTML and probably java script parser. There are a lot such pure java implementations, e.g.
http://www.webrenderer.com/products/server/product/
HTML UNIT
http://lobobrowser.org/java-browser.jsp
Jakarta Cactus