Sign Up

Sign Up to our social questions and Answers Engine to ask questions, answer people’s questions, and connect with other people.

Have an account? Sign In

Have an account? Sign In Now

Sign In

Login to our social questions & Answers Engine to ask questions answer people’s questions & connect with other people.

Sign Up Here

Forgot Password?

Don't have account, Sign Up Here

Forgot Password

Lost your password? Please enter your email address. You will receive a link and will create a new password via email.

Have an account? Sign In Now

You must login to ask a question.

Forgot Password?

Need An Account, Sign Up Here

Please briefly explain why you feel this question should be reported.

Please briefly explain why you feel this answer should be reported.

Please briefly explain why you feel this user should be reported.

Sign InSign Up

The Archive Base

The Archive Base Logo The Archive Base Logo

The Archive Base Navigation

  • Home
  • SEARCH
  • About Us
  • Blog
  • Contact Us
Search
Ask A Question

Mobile menu

Close
Ask a Question
  • Home
  • Add group
  • Groups page
  • Feed
  • User Profile
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Buy Points
  • Users
  • Help
  • Buy Theme
  • SEARCH
Home/ Questions/Q 8101281
In Process

The Archive Base Latest Questions

Editorial Team
  • 0
Editorial Team
Asked: June 5, 20262026-06-05T22:56:42+00:00 2026-06-05T22:56:42+00:00

Thing is: I have a webcrawler framework, and independent modules that implement this framework.

  • 0

Thing is: I have a webcrawler framework, and independent modules that implement this framework. All of these modules capture news from news specific websites.

In the framework there are 2 unpredictable errors which are: IOException, and SocketTimeoutException. For obvious reasons (The website may be offline, and/or under maintenance)

Thing is: In a specific website (THIS one) I get random IOExceptions all the time. I tried predicting it, but I still don’t know why I’m getting this error.

I figured it was from bombing it with requests during test phase. It is not, since in 2 or 3 days without sending another requisition it still throws me the error.

In a nutshell: The site do not require authentication, and it randomly throws 403. RANDOMLY

Since 403 can be multiple different errors, I’d like to see what is the specific problem with my application.

If I could get which 403 it i, I could try and work around it. (403.1, 403.2, …, 403.n)

//If you guys want the code, it's a basic Jsoup get. 
//(I have also tried it with native API, 
//and still get the same random 403 errors)

//Note that I also tried it with no redirection, and still get the error
Document doc = Jsoup
    .connect("http://www.agoramt.com.br/")
    .timeout(60000)
    .followRedirects(true)
    .get();

//You may criticize about the code. But this specific line is the one 
//that throws the error. And it doesn't randomly do that to other 3k
//site connections. That's why I want to get the specifics from the 403
  • 1 1 Answer
  • 0 Views
  • 0 Followers
  • 0
Share
  • Facebook
  • Report

Leave an answer
Cancel reply

You must login to add an answer.

Forgot Password?

Need An Account, Sign Up Here

1 Answer

  • Voted
  • Oldest
  • Recent
  • Random
  1. Editorial Team
    Editorial Team
    2026-06-05T22:56:44+00:00Added an answer on June 5, 2026 at 10:56 pm

    A server may return a 403 on a whim. You are not expected to resolve this on your end except to respect the server’s wishes not to let you in. You may try to read the response body for details provided by the server, but that’s probably all you’ll get. The 403.n error codes you are looking for, I believe, is an IIS-specific feature and the site you pointed out seems to be serving with nginx, so don’t expect to get those.

    If your webcrawler randomly gets a 403 but a regular web browser (from the same IP) never gets a 403 then the best I could suggest is for you to make your webcrawler request headers look exactly like what a regular web browser would send. Whether that is proper behavior for a polite webcrawler is a different discussion.

    • 0
    • Reply
    • Share
      Share
      • Share on Facebook
      • Share on Twitter
      • Share on LinkedIn
      • Share on WhatsApp
      • Report

Sidebar

Related Questions

I have this thing that i need to do and some advices will be
I have this very unpeculiar exception.The thing is have i have this app that
It's really simple, I just want one screen. Wow, that [shiny thing] must have
I have this thing working mostly. What I don't get is, if I have
One thing I have in mind is, that datasets in Core Data (or lets
One thing I have noticed on some sites is that they use one BIIIIIIIG
Ok this is the weirdest thing I have seen in a while... I am
i am new to this backgroundworker thing i have read some articles about how
I have a server that I'm taking down. The only thing I have left
I am in need to do this thing: I have an object like below,

Explore

  • Home
  • Add group
  • Groups page
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Users
  • Help
  • SEARCH

Footer

© 2021 The Archive Base. All Rights Reserved
With Love by The Archive Base

Insert/edit link

Enter the destination URL

Or link to existing content

    No search term specified. Showing recent items. Search or use up and down arrow keys to select an item.