For very large strings (spanning multiple lines) is it faster to use Python’s built-in string search or to split the large string (perhaps on \n) and iteratively search the smaller strings?
E.g., for very large strings:
for l in get_mother_of_all_strings().split('\n'):
if 'target' in l:
return True
return False
or
return 'target' in get_mother_of_all_strings()
ProbablyCertainly the second, I don’t see any difference in doing a search in a big string or many in small strings. You may skip some chars thanks to the shorter lines, but the split operation has its costs too (searching for\n, creating n different strings, creating the list) and the loop is done in python.The string
__contain__method is implemented in C and so noticeably faster.Also consider that the second method aborts as soon as the first match is found, but the first one splits all the string before even starting to search inside it.
This is rapidly proven with a simple benchmark:
The result is:
The bible.txt file actually is the bible, I found it here: http://patriot.net/~bmcgin/kjvpage.html (text version)