I have a simple task I need to perform in Python, which is to convert a string to all lowercase and strip out all non-ascii non-alpha characters.
For example:
'This is a Test' -> 'thisisatest' 'A235th@#$&( er Ra{}|?>ndom' -> 'atherrandom'
I have a simple function to do this:
import string import sys def strip_string_to_lowercase(s): tmpStr = s.lower().strip() retStrList = [] for x in tmpStr: if x in string.ascii_lowercase: retStrList.append(x) return ''.join(retStrList)
But I cannot help thinking there is a more efficient, or more elegant, way.
Thanks!
Edit:
Thanks to all those that answered. I learned, and in some cases re-learned, a good deal of python.
Another solution (not that pythonic, but very fast) is to use string.translate – though note that this will not work for unicode. It’s also worth noting that you can speed up Dana’s code by moving the characters into a set (which looks up by hash, rather than performing a linear search each time). Here are the timings I get for various of the solutions given:
This gives me:
[Edit] Updated with filter solutions as well. (Note that using
set.__contains__makes a big difference here, as it avoids making an extra function call for the lambda.