I’m writing a script in Python that saves attachments from Gmail, only from unseen emails. To save on bandwidth I want to make sure that every file only gets downloaded once.
-I can’t check the folder where I save them, because the file could be removed already, and then it shouldn’t download again. (The scripts accesses the Inbox read_only, so it doesn’t mark the email as read. As soon as the script runs again it will download the same attachments again, until the email gets marked read via another channel.)
-Right now I save the filename to a sqlite database, but there’s 2 problems: I haven’t figured out how to check the database for the filename the next time I run the script, and there’s also a chance that somewhen down the line an attachment arrives with the same filename, which then wouldn’t get downloaded.
What’s a safe and scalable way to make sure I don’t download the files more than once?
You could not only save the filename to the database but save, for example, the Date:-header of the mail, too. (Or any combination of headers of which you are sure that they define a mail uniquely).