Sign Up

Sign Up to our social questions and Answers Engine to ask questions, answer people’s questions, and connect with other people.

Have an account? Sign In

Have an account? Sign In Now

Sign In

Login to our social questions & Answers Engine to ask questions answer people’s questions & connect with other people.

Sign Up Here

Forgot Password?

Don't have account, Sign Up Here

Forgot Password

Lost your password? Please enter your email address. You will receive a link and will create a new password via email.

Have an account? Sign In Now

You must login to ask a question.

Forgot Password?

Need An Account, Sign Up Here

Please briefly explain why you feel this question should be reported.

Please briefly explain why you feel this answer should be reported.

Please briefly explain why you feel this user should be reported.

Sign InSign Up

The Archive Base

The Archive Base Logo The Archive Base Logo

The Archive Base Navigation

  • SEARCH
  • Home
  • About Us
  • Blog
  • Contact Us
Search
Ask A Question

Mobile menu

Close
Ask a Question
  • Home
  • Add group
  • Groups page
  • Feed
  • User Profile
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Buy Points
  • Users
  • Help
  • Buy Theme
  • SEARCH
Home/ Questions/Q 93717
In Process

The Archive Base Latest Questions

Editorial Team
  • 0
Editorial Team
Asked: May 10, 20262026-05-10T23:23:18+00:00 2026-05-10T23:23:18+00:00

A web crawler script that spawns at most 500 threads and each thread basically

  • 0

A web crawler script that spawns at most 500 threads and each thread basically requests for certain data served from the remote server, which each server’s reply is different in content and size from others.

i’m setting stack_size as 756K’s for threads

threading.stack_size(756*1024) 

which enables me to have the sufficient number of threads required and complete most of the jobs and requests. But as some servers’ responses are bigger than others, and when a thread gets that kind of response, script dies with SIGSEGV.

stack_sizes more than 756K makes it impossible to have the required number of threads at the same time.

any suggestions on how can i continue with given stack_size without crashes? and how can i get the current used stack_size of any given thread?

  • 1 1 Answer
  • 0 Views
  • 0 Followers
  • 0
Share
  • Facebook
  • Report

Leave an answer
Cancel reply

You must login to add an answer.

Forgot Password?

Need An Account, Sign Up Here

1 Answer

  • Voted
  • Oldest
  • Recent
  • Random
  1. 2026-05-10T23:23:18+00:00Added an answer on May 10, 2026 at 11:23 pm

    Why on earth are you spawning 500 threads? That seems like a terrible idea!

    Remove threading completely, use an event loop to do the crawling. Your program will be faster, simpler, and easier to maintain.

    Lots of threads waiting for network won’t make your program wait faster. Instead, collect all open sockets in a list and run a loop where you check if any of them has data available.

    I recommend using Twisted – It is an event-driven networking engine. It is very flexile, secure, scalable and very stable (no segfaults).

    You could also take a look at Scrapy – It is a web crawling and screen scraping framework written in Python/Twisted. It is still under heavy development, but maybe you can take some ideas.

    • 0
    • Reply
    • Share
      Share
      • Share on Facebook
      • Share on Twitter
      • Share on LinkedIn
      • Share on WhatsApp
      • Report

Sidebar

Ask A Question

Stats

  • Questions 89k
  • Answers 89k
  • Best Answers 0
  • User 1
  • Popular
  • Answers
  • Editorial Team

    How to approach applying for a job at a company ...

    • 7 Answers
  • Editorial Team

    How to handle personal stress caused by utterly incompetent and ...

    • 5 Answers
  • Editorial Team

    What is a programmer’s life like?

    • 5 Answers
  • Editorial Team
    Editorial Team added an answer Cross-compilation is the act of compiling code for one computer… May 11, 2026 at 5:54 pm
  • Editorial Team
    Editorial Team added an answer Column order had a big performance impact on some of… May 11, 2026 at 5:54 pm
  • Editorial Team
    Editorial Team added an answer You could link in broswer detect and then do something… May 11, 2026 at 5:54 pm

Related Questions

I'm working on a simple web crawler in python and I wan't to make
I have a simple Python web crawler. It uses SQLite to store its output
I'm working on a little web crawler that will run in the system tray
I just started thinking about creating/customizing a web crawler today, and know very little

Trending Tags

analytics british company computer developers django employee employer english facebook french google interview javascript language life php programmer programs salary

Top Members

Explore

  • Home
  • Add group
  • Groups page
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Users
  • Help
  • SEARCH

Footer

© 2021 The Archive Base. All Rights Reserved
With Love by The Archive Base

Insert/edit link

Enter the destination URL

Or link to existing content

    No search term specified. Showing recent items. Search or use up and down arrow keys to select an item.