hi all I’m writing a simple web crawling script that needs to connect to

Question

0

Asked: May 15, 20262026-05-15T14:19:12+00:00 2026-05-15T14:19:12+00:00

hi all I’m writing a simple web crawling script that needs to connect to

0

hi all I’m writing a simple web crawling script that needs to connect to a webpage, follow the 302 redirects automatically, give me the final url from the link and let me grab the html.

What’s the preferred java lib for doing these kinds of things?

thanks

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-05-15T14:19:13+00:00

Editorial Team

2026-05-15T14:19:13+00:00Added an answer on May 15, 2026 at 2:19 pm

You can use Apache HttpComponents Client for this (or “plain vanilla” the Java SE builtin and verbose URLConnection API). For the HTML parsing/traversing/manipulation part Jsoup may be useful.

Note that a bit decent crawler should obey the robots.txt. You may want to take a look at existing Java based webcrawlers, like ~~J-Spider~~ Apache Nutch.

0

Reply
Share
Share

- Report

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

hi all I’m writing a simple web crawling script that needs to connect to

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply