>>> ".a string".split('.')
['', 'a string']
>>> "a .string".split('.')
['a ', 'string']
>>> "a string.".split('.')
['a string', '']
>>> "a ... string".split('.')
['a ', '', '', ' string']
>>> "a ..string".split('.')
['a ', '', 'string']
>>> 'this is a test'.split(' ')
['this', '', 'is', 'a', 'test']
>>> 'this is a test'.split()
['this', 'is', 'a', 'test']
Why is split() different from split(' ') when the invoked string only have spaces as whitespaces?
Why split('.') splits "..." to ['','']? split() does not consider an empty word between 2 separators…
The docs are clear about this (see @agf below), but I’d like to know why is this the chosen behaviour.
I have looked in the source code (here) and thought line 136 should be just less than: …i < str_len…
See the
str.splitdocs, this behavior is specifically mentioned:Python tries to do what you would expect. Most people not thinking too hard would probably expect
to return
Think about splitting data where spaces have been used instead of tabs to create fixed-width columns — if the data is different widths, there will be different number of spaces in each row.
There is often trailing whitespace at the end of a line that you can’t see, and the default ignores it as well — it gives you the answer you’d visually expect.
When it comes to the algorithm used when a delimiter is specified, think about a row in a CSV file:
means there is data in the 1st and 3rd columns, and none in the second, so you would want
to return
otherwise you wouldn’t be able to tell what column each string came from.