I have a function designed to find errors in an application’s search capabilities, which generates a variable-length search string from the non-control UTF-8 possibilities. Running pytest iterations on this function, the random UTF-8 strings, submitted for search, generate debug errors roughly once per 500 searches.
As I can grab each of the strings that caused an error, I want to determine what is the minimal sub-series of the characters in those strings which truly provoke the error. In other words, (inside of a pytest loop):
def fumble_towards_ecstasy(string_that_breaks):
# iterate over both length and content of the string
nugget = # minimum series of characters that break the search
return nugget
Should I slice the string in half and whittle down each side and re-submit until it fails, choose random characters from its (len() – 1) and then back up if an error doesn’t happen? Brute force combinatorial? What’s the best way to step through this?
Thanks.
Splitting the string in half will fail if there is a two character sequence that causes the failure, and that sequence lies exactly in the middle. Each half succeeds, but the combined string fails.
Here’s one algorithm that will find a local minimum:
Try removing each character in turn.