I am writing a web crawler to search for files and download. My problem is I do not want to download the same files that are downloaded already to the local drive. I know it’s possible to use the MD5 hash to compare but how can I do this on HTTP URL without downloading them to the local disk?
If this approach is wrong. Please advice on a better solution
Unless the webserver has some sort of service that shares the MD5, then No.
Computing a file hash requires every byte in the file. This is why changing a single byte changes the hash, to prevent getting modified files.