Assume you have access to an “oracle” implementation whose output you trust to be correct.
The most obvious way to do this seems to be to run a set of known plaintext/hash combinations through the implementation and see if they come out as expected. An arbitrary number of these cases could be constructed by generating random plaintexts (using a static seed to keep it deterministic) and using the oracle to find their hashes.
The major problem I see with this is that it’s not guaranteed to hit possible corner cases. Generating more cases will reduce the likelihood of missing corner cases, but how many cases is enough?
There’s also the side issue of specifying the lengths of these random plaintexts because MD5 takes an arbitrary-length string as input. For my purposes, I don’t care about long inputs (say, anything longer than 16 bytes), so you can use the fact that this is a “special purpose” MD5 implementation in your answer if it makes things simpler or you can just answer for the general case if it’s all the same.
If you have an algorithmic error, it’s extremely likely that every hash will be wrong. Hashes are unforgiving by nature.
Since the majority of possible errors will be exposed quickly, you really won’t need that many tests. The main things to cover are the edge cases:
If those all pass, perhaps along with tests for one or two more representative inputs, you could be pretty confident in your algorithm. There aren’t that many edge cases (unless someone more familiar with the algorithm’s details can think of some more).