need some help from a regex jedi master:
If I have a string of mb chars (specifically, Japanese, Korean or Chinese) with English words sprinkled throughout, I would like to count:
- asian characters as 1 per single char
- english “words” (no dictionary check needed – just a string of consecutive english letters) as a single char.
English only is fine – don’t worry about special spanish, swedish, etc. chars.
I am searching for a regex pattern I can use to count these strings, that will function in php and js.
Example:
これは猫です、けどKittyも大丈夫。
should count as 13 chars.
thanks for your help!
jeff
What ever you are trying to achieve, this will help you:
To count only Hiragana+Katakana+Kanji (Japanese) Chars (excluding punctuation marks):
Updated:
To count only words in Alphabet:
All in one line (as function):
These are the arrays resulted of match:
Updated (JAP, KOR, CH):
These will cover around 99% of the Japanese, Chinese and Korean. You may need to manually add extra characters that are not included such as “〶”.
A very good reference is:
http://www.tamasoft.co.jp/en/general-info/unicode.html
This should solve your question.