I am trying to learn regex in PHP and messing around with the preg_split function.
It doesn’t appear to be correct though, or my understanding is completely wrong.
The test code i am using is:
$string = "test ing ";
var_dump(preg_split('/t/', $string));
I would expect to get an array like the following:
[0] => "es" [1] => " ing "
but the following is being returned:
[0] => "" [1] => "es" [2] => " ing "
Why is there an empty string at the start?
I understand that i can use the PREG_SPLIT_NO_EMPTY flag to filter this but it shouldnt be there to begin with. Should it?
Why shouldn’t it? This is exactly how it works. The semantics of a
splitoperation are that you have a string of this format:(Note that it is starting and ending with a value, not a delimiter.)
So if your string starts with a delimiter, it is absolutely valid to assume that there is an empty value before that delimiter (since the delimiter is supposed to split something into two). You wouldn’t generally want to reject the empty string between two consecutive
ts either, would you?And this is exactly what
PREG_SPLIT_NO_EMPTYis for. You use it whenever you do want to get rid of those empty strings.As a simple example why you would want the default behavior, just think of CSV files. You want to split a line at (for example)
;. You usually also want to allow for empty values. Now if the value in your first column was empty (meaning the line will start with;, and you chopped that first empty string away completely, then suddenly all indices in the resulting array would correspond to different columns. This is why you want to keep those empty strings as well. In many cases you know how many delimiters there are, and hence how many values – and you want to be able to identify which value belongs at which position. Even if some of them are empty.