I’m trying to get words from string in php using preg_split like this:
$result = preg_split('/[^A-Za-z]+/', $text)
but this doesn’t work, some words are split,
what am I doing wrong?
Edit: the fact is it doesn’t work with russian text = “фыва ывафы фываф”;
$result = preg_split('/[^А-яа-я]+/', $text)
[^A-Za-z]only takes ASCII letters into account. You need to split on Unicode non-letters:[^А-Яа-я]+won’t work either because in the Unicode character set,А(0x0410) is not the first Kyrillian letter, andя(0x044F) is not the last one. It appears these honors go toЁ(0x0401) andӹ(0x04F9). I don’t know Russian at all, so I can’t speculate on why this is so.You can check this easily using your character map program: