I want to access forms on HTMl pages throught Java Programming Language without involving real browser in between.
At present I am doing it through HTML UNIT but it takes a bit more time to load a page. When it comes to accessing millions of page, then this extra bit time matters most.
Is there any other methods for doing this?
I’ve used something similar called httpunit before, but I have no idea how it compares performance wise.
If you have millions of pages to process, I would recommend throwing some more threads at it. Just a guess, but I think that if you scale this up to multiple threads, you’ll run out of bandwidth before you run out of CPU power (in which case it won’t matter how much faster it could be)