Given a string in Ruby 1.8.7 (without the awesome Oniguruma regular expression engine that

Question

0

Asked: May 19, 20262026-05-19T05:47:44+00:00 2026-05-19T05:47:44+00:00

Given a string in Ruby 1.8.7 (without the awesome Oniguruma regular expression engine that

0

Given a string in Ruby 1.8.7 (without the awesome Oniguruma regular expression engine that supports Unicode properties with \p{}), I would like to be able to determine if the string contains one or more Chinese, Japanese, or Korean characters; i.e.

class String
  def contains_cjk?
    ...
  end
end

>> '日本語'.contains_cjk?
=> true
>> '광고 프로그램'.contains_cjk?
=> true
>> '艾弗森将退出篮坛'.contains_cjk?
=> true
>> 'Watashi ha bakana gaijin desu.'.contains_cjk?
=> false

I suspect that this will boil down to seeing if any of the characters in the string are in the Unihan CJKV Unicode blocks, but I figured it was worth asking if anyone knows of an existing solution in Ruby.

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-05-19T05:47:45+00:00

(ruby 1.9.2)

#encoding: UTF-8
class String
  def contains_cjk?
    !!(self =~ /\p{Han}|\p{Katakana}|\p{Hiragana}|\p{Hangul}/)
  end
end

strings= ['日本', '광고 프로그램', '艾弗森将退出篮坛', 'Watashi ha bakana gaijin desu.']
strings.each{|s| puts s.contains_cjk?}

#true
#true
#true
#false

\p{} matches a character’s Unicode script.

The following scripts are supported: Arabic, Armenian, Balinese, Bengali, Bopomofo, Braille, Buginese, Buhid, Canadian_Aboriginal, Carian, Cham, Cherokee, Common, Coptic, Cuneiform, Cypriot, Cyrillic, Deseret, Devanagari, Ethiopic, Georgian, Glagolitic, Gothic, Greek, Gujarati, Gurmukhi, Han, Hangul, Hanunoo, Hebrew, Hiragana, Inherited, Kannada, Katakana, Kayah_Li, Kharoshthi, Khmer, Lao, Latin, Lepcha, Limbu, Linear_B, Lycian, Lydian, Malayalam, Mongolian, Myanmar, New_Tai_Lue, Nko, Ogham, Ol_Chiki, Old_Italic, Old_Persian, Oriya, Osmanya, Phags_Pa, Phoenician, Rejang, Runic, Saurashtra, Shavian, Sinhala, Sundanese, Syloti_Nagri, Syriac, Tagalog, Tagbanwa, Tai_Le, Tamil, Telugu, Thaana, Thai, Tibetan, Tifinagh, Ugaritic, Vai, and Yi.

Wow. Ruby Regexp source .

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

Given a string in Ruby 1.8.7 (without the awesome Oniguruma regular expression engine that

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply