I need to split strings of data using each character from string.punctuation and string.whitespace as a separator.
Furthermore, I need for the separators to remain in the output list, in between the items they separated in the string.
For example,
"Now is the winter of our discontent"
should output:
['Now', ' ', 'is', ' ', 'the', ' ', 'winter', ' ', 'of', ' ', 'our', ' ', 'discontent']
I’m not sure how to do this without resorting to an orgy of nested loops, which is unacceptably slow. How can I do it?
A different non-regex approach from the others:
Could use
dict.fromkeysand.getinstead of thelambda, I guess.[edit]
Some explanation:
groupbyaccepts two arguments, an iterable and an (optional) keyfunction. It loops through the iterable and groups them with the value of the keyfunction:where terms with contiguous values of the keyfunction are grouped together. (This is a common source of bugs, actually — people forget that they have to sort by the keyfunc first if they want to group terms which might not be sequential.)
As @JonClements guessed, what I had in mind was
for the case where we were combining the separators.
.getreturnsNoneif the value isn’t in the dict.