I wrote a function to perform better than split() built in function (I know it’s not idiomatic python, but I gave my best), so when I pass this argument:
better_split("After the flood ... all the colors came out."," .")
I’d expected this outcome:
['After', 'the', 'flood', 'all', 'the', 'colors', 'came', 'out']
However, surprisingly, the function causes an incomprehensible (to me) behavior. When it reaches the last two words, it does not suppress the more ” and, rather than add to the outcome list “cam” and “out”, adds to it “came out” and, so, I got this:
['After', 'the', 'flood', 'all', 'the', 'colors', 'came out']
Does someone with more experience understand why this happens?
Thank you in advance for any help!
def better_split(text,markersString):
markers = []
splited = []
for e in markersString:
markers.append(e)
for character in text:
if character in markers:
point = text.find(character)
if text[:point] not in character:
word = text[:point]
splited.append(word)
while text[point] in markers and point+1 < len(text):
point = point + 1
text = text[point:]
print 'final splited = ', splited
better_split(“This is a test-of the,string separation-code!”, ” ,!-“)
better_split(“After the flood … all the colors came out.”,” .”)
split() WITH MULTIPLE SEPARATIONS
If you are looking for split() with multiple separations, see:
Split Strings with Multiple Delimiters?
The best answer without import re that I found was this:
def my_split(s, seps):
res = [s]
for sep in seps:
s, res = res, []
for seq in s:
res += seq.split(sep)
return res
The point is, the iterator was created and became constant when this line:
was executed,
but your aim is to iter the changed text after every for loop.
So the solution is, move the for loop into a inner function and use it recursively:
Other details please see the comments in code.
BTW, may be using builtin function assembly is simpler, although I also think achieve an algorithm independently is a good way to learn language 🙂