I’m wondering is there an algorithm or a library which helps me identify the

Question

0

Asked: May 30, 20262026-05-30T07:21:59+00:00 2026-05-30T07:21:59+00:00

I’m wondering is there an algorithm or a library which helps me identify the

0

I’m wondering is there an algorithm or a library which helps me identify the components in an English which has no meaning? e.g., very serious grammar error? If so, could you explain how it works, because I would really like to implement that or use that for my own projects.

Here’s a random example:

In the sentence: “I closed so etc page hello the door.”

As a human, we can quickly identify that [so etc page hello] does not make any sense. Is it possible for a machine to point out that the string does not make any sense and also contains grammar errors?

If there’s such a solution, how precise can that be? Is it possible, for example, given a clip of an English sentence, the algorithm returns a measure, indicating how meaningful, or correct that clip is? Thank you very much!

PS: I’ve looked at CMU’s link grammar as well as the NLTK library. But still I’m not sure how to use for example link grammar parser to do what I would like to do as the if the parser doesn’t accept the sentence, I don’t know how to tweak it to tell me which part it is not right.. and I’m not sure whether NLTK supported that.

Another thought I had towards solving the problem is to look at the frequencies of the word combination. Since I’m currently interested in correcting very serious errors only. If I define the “serious error” to be the cases where words in a clip of a sentence are rarely used together, i.e., the frequency of the combo should be much lower than those of the other combos in the sentence.

For instance, in the above example: [so etc page hello] these four words really seldom occur together. One intuition of my idea comes from when I type such combo in Google, no related results jump out. So is there any library that provides me such frequency information like Google does? Such frequencies may give a good hint on the correctness of the word combo.

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-05-30T07:22:00+00:00

I think that what you are looking for is a language model. A language model assigns a probability to each sentence of k words appearing in your language. The simplest kind of language models are n-grams models: given the first i words of your sentence, the probability of observing the i+1th word only depends on the n-1 previous words.

For example, for a bigram model (n=2), the probability of the sentence w1 w2 ... wk is equal to

P(w1 ... wk) = P(w1) P(w2 | w1) ... P(wk | w(k-1)).

To compute the probabilities P(wi | w(i-1)), you just have to count the number of occurrence of the bigram w(i-1) wi and of the word w(i-1) on a large corpus.

Here is a good tutorial paper on the subject: A Bit of Progress in Language Modeling, by Joshua Goodman.

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

I’m wondering is there an algorithm or a library which helps me identify the

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply