For a system I’m designing, I want to be able to verify whether a specific string is ‘valid’ or not, but I want to keep my database of ‘valid’ strings private.
I want to provide clients with a database of all valid strings, but this database is (one-way) encrypted. I’m reluctant to distributing a key to the client, since there are always ways to obtain that specific key from the assembly code of my program (I presume).
Clients must be able to enter strings into my program and it would return a boolean based on the presence of the string in my encrypted file.
More importantly, I want my program to easily check whether the string is in the file, but I want to prevent other programs from easily using (and/or reconstructing) the database.
I have formulated this a bit abstract, because I don’t really know how my system will be looking yet, but I want to know whether something like this would be possible.
What you’re looking for are cryptographic hash functions! MD5 and SHA1 are well known examples, but if you’re building new code without strict performance constraints, SHA256 would be the one to chose, and if you’re looking to make it very difficult to recover the original words, you might want to consider scrypt or bcrypt (although they aren’t as popular and unlikely that your language will include them in the standard library).
Then your database can be as simple as an unordered set of hashes, something like (in Python):