Let’s say you make an application that attempts to transliterate stuff from alphabet A into alphabet B, as closely as possible.
Because language B is very complex, this is not always successful. But you do get an approximate transliteration.
How would you build unit tests in this case, considering that you expect 20-30% to fail?
It must always be the goal that your unit test is successful. The way you use it, you can not differentiate between serious errors in your software and the errors in the translations you seem to expect.
I would suggest to separate both: