I’ve just started learning go, and have been working through the tour. The last

Question

0

Asked: June 10, 20262026-06-10T15:22:38+00:00 2026-06-10T15:22:38+00:00

I’ve just started learning go, and have been working through the tour. The last

0

I’ve just started learning go, and have been working through the tour. The last exercise is to edit a web crawler to crawl in parallel and without repeats.

Here is the link to the exercise: http://tour.golang.org/#70

Here is the code. I only changed the crawl and the main function. So I’ll just post those to keep it neat.

    // Crawl uses fetcher to recursively crawl
    // pages starting with url, to a maximum of depth.
    var used = make(map[string]bool)
    var urlchan = make(chan string)
    func Crawl(url string, depth int, fetcher Fetcher) {
        // TODO: Fetch URLs in parallel.
        // Done: Don't fetch the same URL twice.
        // This implementation doesn't do either:
        done := make(chan bool)
        if depth <= 0 {
            return
        }
        body, urls, err := fetcher.Fetch(url)
        if err != nil {
            fmt.Println(err)
            return
        }
        fmt.Printf("\nfound: %s %q\n\n", url, body)
        go func() {
            for _, i := range urls {
                urlchan <- i
            }
            done <- true
        }()
        for u := range urlchan {
            if used[u] == false {
                used[u] = true
                go Crawl(u, depth-1, fetcher)
            }
            if <-done == true {
                break
            }
        }
        return
    }

    func main() {
        used["http://golang.org/"] = true
        Crawl("http://golang.org/", 4, fetcher)
    }

The problem is that when I run the program the crawler stops after printing

    not found: http://golang.org/cmd/

This only happens when I try to make the program run in parallel. If I have it run linearly then all the urls are found correctly.

Note: If I am not doing this right (parallelism I mean) then I apologise.

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-06-10T15:22:39+00:00

Be careful with goroutine.
Because when the main routine, or main() func, returns, all others go routine would be killed immediately.
Your Crawl() seems like recursive, however it is not, which means it would return immediately, not awaiting for other Crawl() routines. And you know that if the first Crawl(), called by main(), returns, the main() func regards its mission fulfilled.
What you could do is to let main() func wait until the last Crawl() returns. The sync package, or a chan would help.

You could probably take a look at the last solution of this, which I did months ago:

var store map[string]bool

func Krawl(url string, fetcher Fetcher, Urls chan []string) {
    body, urls, err := fetcher.Fetch(url)
    if err != nil {
        fmt.Println(err)
    } else {
        fmt.Printf("found: %s %q\n", url, body)
    }
    Urls <- urls
}

func Crawl(url string, depth int, fetcher Fetcher) {
    Urls := make(chan []string)
    go Krawl(url, fetcher, Urls)
    band := 1
    store[url] = true // init for level 0 done
    for i := 0; i < depth; i++ {
        for band > 0 {
            band--
            next := <- Urls
            for _, url := range next {
                if _, done := store[url] ; !done {
                    store[url] = true
                    band++
                    go Krawl(url, fetcher, Urls)
                }
            }
        }
    }
    return
}

func main() {
    store = make(map[string]bool)
    Crawl("http://golang.org/", 4, fetcher)
}

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

I’ve just started learning go, and have been working through the tour. The last

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply