I have a string seperated by \t and ,, but the number of \t is not fixed, for example :
a=["seg1\tseg2\t\tseg3,seg4"]
seg2 and seg3 is seperated by two \t.
So I try to split them by
a.split(/\t+|,/)
it print the right anwser :
["seg1", "seg2", "seg3", "seg4"]
And I also try this
a.split(/[\t+,]/)
but the answer is
["seg1", "seg2", "", "seg3", "seg4"]
Why ruby print different results?
Because
\t+inside[]does not mean “one or more tabs”, it means “a tab or a plus”. Since it finds two consecutive tabs, it splits twice, and the string in the middle becomes empty.Most special characters, like
. + * ?etc, when placed in an interval become “regular” characters. There are some exceptions, like^(which negates the interval when placed at the beginning), the\(that escapes the next character(s), just like it does outside intervals) and the](that closes the interval; another[is also disallowed there). So,[\t+,]actually means'\t' or '+' or ','.Unfortunatly, I don’t know any reference for the full set of characters that need or don’t need escaping inside an interval. In doubt, I tend to escape just to be sure. In any case, an interval will always match a single character only, if you want something different you must put your quantifier outside the interval. (For example:
[\t,]+, if you also admit two commas in a row; otherwise, your first regex is really the correct one)