I have 40KB HTML page and I want to find certain patterns in it.

Question

0

Asked: May 19, 20262026-05-19T10:19:34+00:00 2026-05-19T10:19:34+00:00

I have 40KB HTML page and I want to find certain patterns in it.

0

I have 40KB HTML page and I want to find certain patterns in it.

I can read it by 1K buffer but I want to avoid situation that pattern that I’m searching would be split between two buffer reads.

How to overcome this problem?

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-05-19T10:19:35+00:00

This is easy. You count the longest pattern you will look for, then either backtrack the file pointer by that amount, or you scroll through the file, reading only the delta.

Imagine the longest pattern being 26 bytes.

Read 1k.
Check for all patterns -> nothing.
Drop 1k – 26 bytes from the buffer.
Read 1k – 26 bytes from stream and add to your buffer
Goto 2.

Edit: Let me clarify: There are two methods to do this, both have their merits. The one I documented above is best used if you are reading from a stream, which means a data source that does not support seeking. If, however, your datasource does support seeking (like a filesystem file), you can easily do the same with seeks. Check for pattern, if not found, seek back the size of your longest pattern, then start from there.

If, however, you want to support the search for patterns that are longer than your buffer size, you might need a much more clever algorithm. You would need a lookup table of all patterns that are currently “open” when you contnue to read more data, which in turn will cost more memory – you get the problem.

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

I have 40KB HTML page and I want to find certain patterns in it.

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply