Sign Up

Sign Up to our social questions and Answers Engine to ask questions, answer people’s questions, and connect with other people.

Have an account? Sign In

Have an account? Sign In Now

Sign In

Login to our social questions & Answers Engine to ask questions answer people’s questions & connect with other people.

Sign Up Here

Forgot Password?

Don't have account, Sign Up Here

Forgot Password

Lost your password? Please enter your email address. You will receive a link and will create a new password via email.

Have an account? Sign In Now

You must login to ask a question.

Forgot Password?

Need An Account, Sign Up Here

Please briefly explain why you feel this question should be reported.

Please briefly explain why you feel this answer should be reported.

Please briefly explain why you feel this user should be reported.

Sign InSign Up

The Archive Base

The Archive Base Logo The Archive Base Logo

The Archive Base Navigation

  • Home
  • SEARCH
  • About Us
  • Blog
  • Contact Us
Search
Ask A Question

Mobile menu

Close
Ask a Question
  • Home
  • Add group
  • Groups page
  • Feed
  • User Profile
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Buy Points
  • Users
  • Help
  • Buy Theme
  • SEARCH
Home/ Questions/Q 854967
In Process

The Archive Base Latest Questions

Editorial Team
  • 0
Editorial Team
Asked: May 15, 20262026-05-15T08:00:30+00:00 2026-05-15T08:00:30+00:00

When crawling on webpages I need to be careful as to not make too

  • 0

When crawling on webpages I need to be careful as to not make too many requests to the same domain, for example I want to put 1 s between requests. From what I understand it is the time between requests that is important. So to speed things up I want to use async workflows in F#, the idea being make your requests with 1 sec interval but avoid blocking things while waiting for request response.

let getHtmlPrimitiveAsyncTimer (uri : System.Uri) (timer:int) =
    async{

            let req =  (WebRequest.Create(uri)) :?> HttpWebRequest
            req.UserAgent<-"Mozilla"
            try 

                Thread.Sleep(timer)
                let! resp =    (req.AsyncGetResponse())
                Console.WriteLine(uri.AbsoluteUri+" got response")
                use stream = resp.GetResponseStream()
                use reader = new StreamReader(stream)
                let html = reader.ReadToEnd()
                return html
            with 
            | _ as ex -> return "Bad Link"
                 }

Then I do something like:

let uri1 = System.Uri "http://rue89.com"
let timer = 1000
let jobs = [|for i in 1..10 -> getHtmlPrimitiveAsyncTimer uri1 timer|]

jobs
|> Array.mapi(fun i job -> Console.WriteLine("Starting job "+string i)
                               Async.StartAsTask(job).Result)

Is this alright ? I am very unsure about 2 things:
-Does the Thread.Sleep thing work for delaying the request ?
-Is using StartTask a problem ?

I am a beginner (as you may have noticed) in F# (coding in general actually ), and everything envolving Threads scares me 🙂

Thanks !!

  • 1 1 Answer
  • 0 Views
  • 0 Followers
  • 0
Share
  • Facebook
  • Report

Leave an answer
Cancel reply

You must login to add an answer.

Forgot Password?

Need An Account, Sign Up Here

1 Answer

  • Voted
  • Oldest
  • Recent
  • Random
  1. Editorial Team
    Editorial Team
    2026-05-15T08:00:31+00:00Added an answer on May 15, 2026 at 8:00 am

    I think what you want to do is
    – create 10 jobs, numbered ‘n’, each starting ‘n’ seconds from now
    – run those all in parallel

    Approximately like

    let makeAsync uri n = async {
        // create the request
        do! Async.Sleep(n * 1000)
        // AsyncGetResponse etc
        }
    
    let a = [| for i in 1..10 -> makeAsync uri i |]
    let results = a |> Async.Parallel |> Async.RunSynchronously
    

    Note that of course they all won’t start exactly now, if e.g. you have a 4-core machine, 4 will start running very soon, but then quickly execute up to the Async.Sleep, at which point the next 4 will run up until their sleeps, and so forth. And then in one second the first async wakes up and posts a request, and another second later the 2nd async wakes up, … so this should work. The 1s is only approximate, since they’re starting their timers each a very tiny bit staggered from one another… you may want to buffer it a little, e.g. 1100 ms or something if the cut-off you need is really exactly a second (network latencies and whatnot still leave a bit of this outside the possible control of your program probably).

    Thread.Sleep is suboptimal, it will work ok for a small number of requests, but you’re burning a thread, and threads are expensive and it won’t scale to a large number.

    You don’t need StartAsTask unless you want to interoperate with .NET Tasks or later do a blocking rendezvous with the result via .Result. If you just want these to all run and then block to collect all the results in an array, Async.Parallel will do that fork-join parallelism for you just fine. If they’re just going to print results, you can fire-and-forget via Async.Start which will drop the results on the floor.

    (An alternative strategy is to use an agent as a throttle. Post all the http requests to a single agent, where the agent is logically single-threaded and sits in a loop, doing Async.Sleep for 1s, and then handling the next request. That’s a nice way to make a general-purpose throttle… may be blog-worthy for me, come to think of it.)

    • 0
    • Reply
    • Share
      Share
      • Share on Facebook
      • Share on Twitter
      • Share on LinkedIn
      • Share on WhatsApp
      • Report

Sidebar

Related Questions

I'm crawling an API. There is many many many requests. And if I do
I need to access a lucene index ( created by crawling several webpages using
I want to allow crawling of files in: /directory/ but not crawling of files
I'm trying to test Scrapy for crawling web pages and I do not understand
I'm crawling a web site (only two levels deep), and I want to scrape
I want to stop search engines from crawling my whole website. I have a
I'm creating a web crawling application that I want the 'average' user to be
I want to know when google is crawling the site, preferably by sending myself
I want to be able to run the Scrapy web crawling framework from within
After crawling many websites, in some of them i receive broken-encoding data. I can't

Explore

  • Home
  • Add group
  • Groups page
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Users
  • Help
  • SEARCH

Footer

© 2021 The Archive Base. All Rights Reserved
With Love by The Archive Base

Insert/edit link

Enter the destination URL

Or link to existing content

    No search term specified. Showing recent items. Search or use up and down arrow keys to select an item.