Do you happen to know of an opensource Java component that provides the facility to scan a set of dynamic pages (JSP) and then extract all the input parameters from there. Of course, a crawler would be able to crawl static code and not dynamic code, but my idea here is to extend it to crawl a webserver including all the server-side code. Naturally, I am assuming that the tool will have full access to the crawled webserver and not by using any hacks.
The idea is to build a static analyzer that has the capacity to detect all parameters (request.getParameter() and such) fields from all dynamic pages.
You cannot use a web crawler (basically, a HTML parser) to extract request parameters. They can at highest scan the HTML structure. You can use for example Jsoup for this:
This prints currently
Form found: action=, method= Input found: name=hl, value=en Input found: name=source, value=hp Input found: name=ie, value=ISO-8859-1 Input found: name=q, value= Input found: name=btnG, value=Google Search Input found: name=btnI, value=I'm Feeling Lucky Input found: name=, value= Form found: action=/search, method= Input found: name=hl, value=en Input found: name=source, value=hp Input found: name=ie, value=ISO-8859-1 Input found: name=q, value= Input found: name=btnG, value=Google Search Input found: name=btnI, value=I'm Feeling LuckyIf you want to scan the JSP source code for any forms/inputs, then you have to look in a different direction, it’s definitely not to be called “web crawler”. Unfortunately no such static analysis tool comes to mind. Closest what you can get is to create a
Filterwhich logs all submitted request parameters.