How to split a multibyte string into words in Php?
Here is what I have done so far, but I would like to improve the code…
mb_internal_encoding( 'UTF-8');
mb_regex_encoding( 'UTF-8');
$arr = mb_split( '[\s\[\]().,;:-_]', $str );
Is there a way to say that a word is a sequence of “alpha”-characters (not using the notation a-z, since I would like to include non-latin characters)
Try this baby here:
Matches all possible letter with their accents as words:
See it.