Sign Up

Sign Up to our social questions and Answers Engine to ask questions, answer people’s questions, and connect with other people.

Have an account? Sign In

Have an account? Sign In Now

Sign In

Login to our social questions & Answers Engine to ask questions answer people’s questions & connect with other people.

Sign Up Here

Forgot Password?

Don't have account, Sign Up Here

Forgot Password

Lost your password? Please enter your email address. You will receive a link and will create a new password via email.

Have an account? Sign In Now

You must login to ask a question.

Forgot Password?

Need An Account, Sign Up Here

Please briefly explain why you feel this question should be reported.

Please briefly explain why you feel this answer should be reported.

Please briefly explain why you feel this user should be reported.

Sign InSign Up

The Archive Base

The Archive Base Logo The Archive Base Logo

The Archive Base Navigation

  • SEARCH
  • Home
  • About Us
  • Blog
  • Contact Us
Search
Ask A Question

Mobile menu

Close
Ask a Question
  • Home
  • Add group
  • Groups page
  • Feed
  • User Profile
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Buy Points
  • Users
  • Help
  • Buy Theme
  • SEARCH
Home/ Questions/Q 851545
In Process

The Archive Base Latest Questions

Editorial Team
  • 0
Editorial Team
Asked: May 15, 20262026-05-15T07:26:40+00:00 2026-05-15T07:26:40+00:00

Some time in the near future I will need to implement a cross-language word

  • 0

Some time in the near future I will need to implement a cross-language word count, or if that is not possible, a cross-language character count.

By word count I mean an accurate count of the words contained within the given text, taking the language of the text. The language of the text is set by a user, and will be assumed to be correct.

By character count I mean a count of the “possibly in a word” characters contained within the given text, with the same language information described above.

I would much prefer the former count, but I am aware of the difficulties involved. I am also aware that the latter count is much easier, but very much prefer the former, if at all possible.

I’d love it if I just had to look at English, but I need to consider every language here, Chinese, Korean, English, Arabic, Hindi, and so on.

I would like to know if Stack Overflow has any leads on where to start looking for an existing product / method to do this in PHP, as I am a good lazy programmer*

A simple test showing how str_word_count with set_locale doesn’t work, and a function from php.net’s str_word_count page.

*http://blogoscoped.com/archive/2005-08-24-n14.html

  • 1 1 Answer
  • 0 Views
  • 0 Followers
  • 0
Share
  • Facebook
  • Report

Leave an answer
Cancel reply

You must login to add an answer.

Forgot Password?

Need An Account, Sign Up Here

1 Answer

  • Voted
  • Oldest
  • Recent
  • Random
  1. Editorial Team
    Editorial Team
    2026-05-15T07:26:41+00:00Added an answer on May 15, 2026 at 7:26 am

    Counting chars is easy:

    echo strlen('一个有十的字符的句子'); // 30 (WRONG!)
    echo strlen(utf8_decode('一个有十的字符的句子')); // 10
    

    Counting words is where things start to get tricky, specially for Chinese, Japanese and other languages that don’t use spaces (or other common “word boundary” characters) as word separators. I don’t speak Chinese and I don’t understand how word counting works in Chinese, so you’ll have to educate me a bit – what makes a word in these languages? Is it any specific char or set of chars? I remember reading something related to how hard it was to identify Japanese words in T9 writing but can’t find it anymore.

    The following should correctly return the number of words in languages that use spaces or punctuation chars as words separators:

    count(preg_split('~[\p{Z}\p{P}]+~u', $string, null, PREG_SPLIT_NO_EMPTY));
    
    • 0
    • Reply
    • Share
      Share
      • Share on Facebook
      • Share on Twitter
      • Share on LinkedIn
      • Share on WhatsApp
      • Report

Sidebar

Related Questions

Some time in the near future, I will begin developing a game engine. One
Some time ago I've read an article on CLR, where author showed that if
Some time ago I was asked the strange question how would I implement map
Some time ago, I've got a new single board computer running Debian which will
I'm writing a Java application that will run for a long time (essentially, it
I need to make an application that will start playing same media on two
In the very near future I'll be migrating some web applications from VS2003 projects
I have a little app that has been under development for some time. My
I'm rebuilding an application from the ground up. At some point in the future...not
Some time ago, I wrote some code to decide which method of updating mutable

Explore

  • Home
  • Add group
  • Groups page
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Users
  • Help
  • SEARCH

Footer

© 2021 The Archive Base. All Rights Reserved
With Love by The Archive Base

Insert/edit link

Enter the destination URL

Or link to existing content

    No search term specified. Showing recent items. Search or use up and down arrow keys to select an item.