Happy New Year everybody,now I am trying to develop my own bot(web crawler) that will walk around through Internet, for search engine. I am thinking to use jboss scheduler-service to schedule bot and something like this to get content:
URL u = new URL("http://www.google.kz");
InputStream in = u.openStream();
I want to ask which EJB3 or jBoss features should I use to develop effectively(in right way) my bot?
I am new to EJB3 and jBoss.
If you have better ideas, you could right here.I am developing search engine to practice my Java skills and in academical issues, I am not going to compete with Google 🙂
- jboss-5.1.0.GA
- XP
- EJB3
- Eclipse helios
P.S. I didn’t decide yet how I will parse html, I am thinking about something like this Parse HTML. What can you recommend?
You don’t need EJB or JBoss at all. In fact I can hardly think of a use of them for a web-crawler. Only perhaps if you are using JPA to store the results from the crawl, then you can make use of container-managed transactions, and the automatic injection of the JPA entity manager. Apart from that – no.