thanks for the answers to :
“regular expression to detect numbers written as words” :
regular expression to detect numbers written as words
I now have this working, however I have the same requirement but the numbers as words are in Arabic (or any other UTF-8) and not English, so :
if (preg_match("/\p{L}\b(?:(?:واحد|اثنان|ثلاثة|أربعة|خمسة|ستة|سبعة|ثمانية|تسعة|صفر|عشرة)\b\s*?){4}/", $str, $matches) > 0)
return true;
Does not work – I’ve googled and there seems to be quite a few issues with preg_match and UTF-8 string but I couldn’t get any of the suggestions found to work. Any help much appreciated.
Note that
\bmay not be working as you expect.\bspecifies a word boundary, but what is considered a word character by PCRE depends on what locale the script is running in (take a look towards the bottom of the PCRE escape sequences manual page):You might also want to read Handling UTF-8 with PHP (the section on PCRE in particular).
Instead, you could use a lookaround in conjunction with a Unicode character property to emulate a word boundary:
(?<=\P{L}). This asserts that the previous character is not a unicode “letter”.So all together it would look like: