Sign Up

Sign Up to our social questions and Answers Engine to ask questions, answer people’s questions, and connect with other people.

Have an account? Sign In

Have an account? Sign In Now

Sign In

Login to our social questions & Answers Engine to ask questions answer people’s questions & connect with other people.

Sign Up Here

Forgot Password?

Don't have account, Sign Up Here

Forgot Password

Lost your password? Please enter your email address. You will receive a link and will create a new password via email.

Have an account? Sign In Now

You must login to ask a question.

Forgot Password?

Need An Account, Sign Up Here

Please briefly explain why you feel this question should be reported.

Please briefly explain why you feel this answer should be reported.

Please briefly explain why you feel this user should be reported.

Sign InSign Up

The Archive Base

The Archive Base Logo The Archive Base Logo

The Archive Base Navigation

  • Home
  • SEARCH
  • About Us
  • Blog
  • Contact Us
Search
Ask A Question

Mobile menu

Close
Ask a Question
  • Home
  • Add group
  • Groups page
  • Feed
  • User Profile
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Buy Points
  • Users
  • Help
  • Buy Theme
  • SEARCH
Home/ Questions/Q 6144989
In Process

The Archive Base Latest Questions

Editorial Team
  • 0
Editorial Team
Asked: May 23, 20262026-05-23T18:43:49+00:00 2026-05-23T18:43:49+00:00

I would like to scan some websites looking for broken links, preferably using Java.

  • 0

I would like to scan some websites looking for broken links, preferably using Java. Any hint how can I start doing this?

(I know there are some websites that do this, but I want to make my own personalized log file)

  • 1 1 Answer
  • 0 Views
  • 0 Followers
  • 0
Share
  • Facebook
  • Report

Leave an answer
Cancel reply

You must login to add an answer.

Forgot Password?

Need An Account, Sign Up Here

1 Answer

  • Voted
  • Oldest
  • Recent
  • Random
  1. Editorial Team
    Editorial Team
    2026-05-23T18:43:50+00:00Added an answer on May 23, 2026 at 6:43 pm

    Writing a web-crawler isn’t as simple as just reading the static HTML, if the page uses JavaScript to modify the DOM then it gets complex. You will also need to look for pages you’ve already visited aka Spider Traps? If the site is pure static HTML, then go for it… But if the site uses Jquery and is large, expect it to be complex.

    If your site is all static, small and has little or no JS then use the answers already listed.

    Or

    You could use Heritrix and then later parsed it’s crawl.log for 404’s. Heritrix doc on crawl.log

    Or If you most write your own:

    You could use something like HTMLUnit (it has a JavaScript engine) to load the page, then query the DOM object for links. Then place each link in a “unvisited” queue, then pull links from the unvisited queue to get your next url to load, if the page fails to load, report it.

    To avoid duplicate pages (spider traps) you could hash each link and keep a HashTable of visited pages (see CityHash ). Before placing a link into the unvisited queue check it against the visited hashtable.

    To avoid leaving your site check that the URL is in a safe domain list before adding it to the unvisited queue. If you want to confirm that the off domain links are good, then keep them in a offDomain queue. Then later load each link from this queue using URL.getContent(url) to see if they work (faster than using HTMLUnit and you don’t need to parse the page anyway.).

    • 0
    • Reply
    • Share
      Share
      • Share on Facebook
      • Share on Twitter
      • Share on LinkedIn
      • Share on WhatsApp
      • Report

Sidebar

Related Questions

I would like to scan a directory for any assemblies that are not already
I would like to scan a large piece of text using PHP and find
I have a directory of files that I would like to scan on a
Would like to get a list of advantages and disadvantages of using Stored Procedures.
Would like to create a strong password in C++. Any suggestions? I assume it
Would like to make anapplication in Java that will not automatically parse parameters used
I would like to sort an array in ascending order using C/C++ . The
I'm running into some issues with boost::bind and creating threads. Essentially, I would like
Here is my situation. I am using two java.util.HashMap to store some frequently used
I would like to scan a directory that has image files and populate my

Explore

  • Home
  • Add group
  • Groups page
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Users
  • Help
  • SEARCH

Footer

© 2021 The Archive Base. All Rights Reserved
With Love by The Archive Base

Insert/edit link

Enter the destination URL

Or link to existing content

    No search term specified. Showing recent items. Search or use up and down arrow keys to select an item.