Sign Up

Sign Up to our social questions and Answers Engine to ask questions, answer people’s questions, and connect with other people.

Have an account? Sign In

Have an account? Sign In Now

Sign In

Login to our social questions & Answers Engine to ask questions answer people’s questions & connect with other people.

Sign Up Here

Forgot Password?

Don't have account, Sign Up Here

Forgot Password

Lost your password? Please enter your email address. You will receive a link and will create a new password via email.

Have an account? Sign In Now

You must login to ask a question.

Forgot Password?

Need An Account, Sign Up Here

Please briefly explain why you feel this question should be reported.

Please briefly explain why you feel this answer should be reported.

Please briefly explain why you feel this user should be reported.

Sign InSign Up

The Archive Base

The Archive Base Logo The Archive Base Logo

The Archive Base Navigation

  • Home
  • SEARCH
  • About Us
  • Blog
  • Contact Us
Search
Ask A Question

Mobile menu

Close
Ask a Question
  • Home
  • Add group
  • Groups page
  • Feed
  • User Profile
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Buy Points
  • Users
  • Help
  • Buy Theme
  • SEARCH
Home/ Questions/Q 8384047
In Process

The Archive Base Latest Questions

Editorial Team
  • 0
Editorial Team
Asked: June 9, 20262026-06-09T17:18:48+00:00 2026-06-09T17:18:48+00:00

I am creating a web crawler using Java EE Technologies. I have created a

  • 0

I am creating a web crawler using Java EE Technologies. I have created a crawler service which contains the result of the WebCrawler in term CrawlerElement objects which contains information of interest to me.

Currently I am using JSOUP Library in order to do this. But it is not reliable I am attempting the connection three times and also timeout is 10seconds still It is unreliable.

By unreliable I mean even if it can be accessed publicly, It can not be accessed by the crawler program. I know it could be due to robots.txt exclusion but in that also it is allowed but still it is unrealiable.

So I decided to go with URLConnection object which has openConnection and then connect method for doing this.

I have one more requirement which is bugging me and that is : I have to get the response time in milliseconds for a CrawlerElement which means how many seconds it took to load page B from Page A?? and I checked the methods of URLConnection there is no way out in order to do that.

Any ideas in that topic? Can anyone help me?

I was thinking writing a code before and after which takes current time in milliseconds before the gettingContent code and current time in milliseconds subtract and save that milliseconds in database but I was thing whether it would be accurate or not?

Thanks in advance.

EDIT : CURRENT IMPLEMENTATION

Current Implementation which gives me statusCode, contentType etc..

import java.io.IOException;
import java.net.URL;
import java.net.URLConnection;


public class GetContent {
public static void main(String args[]) throws IOException {
    URL url = new URL("http://www.javacoffeebreak.com/faq/faq0079.html");
    long startTime = System.currentTimeMillis();
    URLConnection uc = url.openConnection();
    uc.setRequestProperty("Authorization", "Basic bG9hbnNkZXY6bG9AbnNkM3Y=");
    uc.setRequestProperty("User-Agent", "");
    uc.connect();
    long endTime = System.currentTimeMillis();
    System.out.println(endTime - startTime);
    String contentType = uc.getContentType();
    System.out.println(contentType);
    String statusCode = uc.getHeaderField(0);
    System.out.println(statusCode);     
   }
}

what say is it okay to do this way or I should use heavy API’s like Apache HttpClient or Apache Nutch..

  • 1 1 Answer
  • 0 Views
  • 0 Followers
  • 0
Share
  • Facebook
  • Report

Leave an answer
Cancel reply

You must login to add an answer.

Forgot Password?

Need An Account, Sign Up Here

1 Answer

  • Voted
  • Oldest
  • Recent
  • Random
  1. Editorial Team
    Editorial Team
    2026-06-09T17:18:49+00:00Added an answer on June 9, 2026 at 5:18 pm

    OK it means you have did work and getting problems in that API/Library.I know it is terrifying to build one thing and then waste that all code and shift to another one but if it would be possible for you As JSoup is just a parser library and it may cause some more problems to you in future so I suggest you to use these more stable API.You can also use crawler4j for that purpose.
    Here is the list of some open source crawler API’s and by doing some R&D you can find a good solution for this 🙂

    • 0
    • Reply
    • Share
      Share
      • Share on Facebook
      • Share on Twitter
      • Share on LinkedIn
      • Share on WhatsApp
      • Report

Sidebar

Related Questions

I creating a web application using JSF,Hibernate,Spring. I have added a filter for checking
I'm creating a web application using spring mvc. I have started to incorporate the
I'm using Codeigniter 1.7. Does anyone have any experience of creating web services with
I have a bit of experience in creating web applications using asp.net however I
I plan to start developing web-service client using Delphi XE. It looks like creating
I'm creating some web services using JAX-WS and the java SE build-in server. Every
I am creating a new web crawler using C# to crawl some specific websites.
I'm creating a web crawler. I'm ganna give it an URL and it will
I am creating web page using asp.net. Is it possible to remove/hide the browsers
So far I have been creating Web Portal but recently I had a request

Explore

  • Home
  • Add group
  • Groups page
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Users
  • Help
  • SEARCH

Footer

© 2021 The Archive Base. All Rights Reserved
With Love by The Archive Base

Insert/edit link

Enter the destination URL

Or link to existing content

    No search term specified. Showing recent items. Search or use up and down arrow keys to select an item.