With something like the following you'll only need to build…

Question

0

Editorial Team

Asked: May 10, 20262026-05-10T21:19:53+00:00 2026-05-10T21:19:53+00:00

Is there a way (a program, a library) to approximately know which language a

0

Is there a way (a program, a library) to approximately know which language a document is written in?

I have a bunch of text documents (~500K) in mixed languages to import in a i18n enabled CMS (Drupal)..

I don’t need perfect matches, only some guess.

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

score 0 · Answer 1 · 2026-05-10T21:19:53+00:00

2026-05-10T21:19:53+00:00Added an answer on May 10, 2026 at 9:19 pm

There is a pretty easy way to do this, given that you have corpus data in all the different languages you’ll need to identify. It’s called n-gram modeling. I think Lingua::Identify does this already, though, so that is your best bet rather than implementing your own.

0

Reply
Share
Share

- Report

How to approach applying for a job at a company ...

How to handle personal stress caused by utterly incompetent and ...

What is a programmer’s life like?

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions