I have been using git a lot recently and I quite like the concept of how GIT avoid duplicating similar data by using a hashing function based on sha1. I was wondering if current databases do something similar, or is this inefficient for some reason?
Share
I came up with a nice “reuse-based-on-hash” technique (it’s probably widely used though)
I computed the hash-code of all fields in the row, and then I used this hash-code as primary key.
When I inserted I simply did “INSERT IGNORE” (to suppress errors about duplicate primary keys). Either way I could be sure that what I wanted to insert, was present in the database after insertion.
If this is a known concept I’d be glad to hear about it!