I’m planning to build an application which would crawl a part of a local

Question

0

Asked: June 15, 20262026-06-15T22:07:42+00:00 2026-06-15T22:07:42+00:00

I’m planning to build an application which would crawl a part of a local

0

I’m planning to build an application which would crawl a part of a local filesystem (a subtree) in a depth-first-search manner and process all files it finds, except for some configurable exceptions.

To give an example, let’s say I have a directory structure like this:

> documents
  - generic-doc.txt
  > mails
    - mail-01.txt
    - mail-02.txt
    - mail-03.txt
    > unread
      - mail-04.txt
  > invoices
    > paid
      - invoice-01.pdf
      - invoice-02.pdf
    > unpaid
      - invoice-03.pdf

I also have an exclusion rule like this:

exclude = "documents/mails/unread | documents/invoices"

Given these data on input, my application would process the following documents:

generic-doc.txt
mail-01.txt
mail-02.txt
mail-03.txt

(e.g. it would process all files, except for those located in the documents/mails/unread and documents/invoices folders)

In future, I might need to implement various forms of exlusion rules.

What is the best way to test the implementation of the crawling module (e.g. that when given an exclusion rule, the module would return the correct set of documents)? Can it be done without using a real filesystem?

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-06-15T22:07:43+00:00

Editorial Team

2026-06-15T22:07:43+00:00Added an answer on June 15, 2026 at 10:07 pm

Extract the exclusion ruling to a separate module/class/object and test that in isolation. Then make sure, that your crawler asks the ExclusionRule before processing a file.

A sketch

public interface FileExcluder {
   boolean isExcluded(File aFile);
}

Note that there is already the FileFilter that provides a similar service, maybe you can reuse that abstraction.

0

Reply
Share
Share

- Report

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

I’m planning to build an application which would crawl a part of a local

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply