From Wikipedia, A system is said to be real-time if…

Question

0

Asked: May 13, 20262026-05-13T12:33:11+00:00 2026-05-13T12:33:11+00:00

When I search for something, I get content that have the same text and

0

When I search for something, I get content that have the same text and title.
Of course, there is always an original (where others copy/leech from)

If you have expertise in search and crawling…how do you recommend that I remove these duplicates? (in a very feasible and efficient mannter)

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-05-13T12:33:12+00:00

Sounds like a programming question to me.

If you have a clear idea about what the stolen and original components of these pages are, and those differences are general enough that you can write a filter to separate them, then do that, hash the ‘stolen’ content, and then you should be able to compare hashes to determine if two pages are the same.

I guess web-page thieves might go to some further code-obfuscation to mess you up, including changing whitespace, so you might want to normalise the html before hashing, for instance removing any redundant whitespace, making all attributes use " quotes etc.

How to approach applying for a job at a company ...

How to handle personal stress caused by utterly incompetent and ...

What is a programmer’s life like?

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions