Is there a way to find out if a string contains any one of the characters in a set with python?
It’s straightforward to do it with a single character, but I need to check and see if a string contains any one of a set of bad characters.
Specifically, suppose I have a string:
s = 'amanaplanacanalpanama~012345'
and I want to see if the string contains any vowels:
bad_chars = 'aeiou'
and do this in a for loop for each line in a file:
if [any one or more of the bad_chars] in s:
do something
I am scanning a large file so if there is a faster method to this, that would be ideal. Also, not every bad character has to be checked—so long as one is encountered that is enough to end the search.
I’m not sure if there is a builtin function or easy way to implement this, but I haven’t come across anything yet. Any pointers would be much appreciated!
or
or
“so long as one is encountered that is enough to end the search.” – This will be true if you use the first method.
You say you are concerned with performance: performance should not be an issue unless you are dealing with a huge amount of data. If you encounter issues, you can try:
Regexes
edit Previously I had written a section here on using regexes, via the
remodule, programatically generating a regex that consisted of a single character-class[...]and using.finditer, with the caveat that putting a simple backslash before everything might not work correctly. Indeed, after testing it, that is the case, and I would definitely not recommend this method. Using this would require reverse engineering the entire (slightly complex) sub-grammar of regex character classes (e.g. you might have characters like\followed byw, like]or[, or like-, and merely escaping some like\wmay give it a new meaning).Sets
Depending on whether the
str.__contains__operation is O(1) or O(N), it may be justifiable to first convert your text/lines into a set to ensure theinoperation is O(1), if you have many badChars:(it may be possible to make that a one-liner
any((c in set(yourString)) for c in badChars), depending on how smart the python compiler is)Do you really need to do this line-by-line?
It may be faster to do this once for the entire file O(#badchars), than once for every line in the file O(#lines*#badchars), though the asymptotic constants may be such that it won’t matter.