I need to scrape a website that has a basic folder system, with folders

Question

0

Asked: June 3, 20262026-06-03T21:35:41+00:00 2026-06-03T21:35:41+00:00

I need to scrape a website that has a basic folder system, with folders

0

I need to scrape a website that has a basic folder system, with folders labled with keywords – some of the folders contain text files. I need to scan all the pages (folders) and check the links to new folders, record keywords and files. My main problem ise more abstract: if there is a directory with nested folders and unknown “depth”, what is the most pythonc way to iterate through all of them. [if the “depth” would be known, it would be a really simple for loop). Ideas greatly appriciated.

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-06-03T21:35:43+00:00

Here’s a simple spider algorithm. It uses a deque for documents to be processed and a set of already processed documents:

active = deque()
seen = set()

active.append(first document)

while active is not empty:
    document = active.popleft()
    if document in seen:
        continue

    # do stuff with the document -- e.g. index keywords

    seen.add(document)
    for each link in the document:
         active.append(link)

Note that this is iterative and as such can work with arbitrary deep trees.

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

I need to scrape a website that has a basic folder system, with folders

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply