Sign Up

Sign Up to our social questions and Answers Engine to ask questions, answer people’s questions, and connect with other people.

Have an account? Sign In

Have an account? Sign In Now

Sign In

Login to our social questions & Answers Engine to ask questions answer people’s questions & connect with other people.

Sign Up Here

Forgot Password?

Don't have account, Sign Up Here

Forgot Password

Lost your password? Please enter your email address. You will receive a link and will create a new password via email.

Have an account? Sign In Now

You must login to ask a question.

Forgot Password?

Need An Account, Sign Up Here

Please briefly explain why you feel this question should be reported.

Please briefly explain why you feel this answer should be reported.

Please briefly explain why you feel this user should be reported.

Sign InSign Up

The Archive Base

The Archive Base Logo The Archive Base Logo

The Archive Base Navigation

  • Home
  • SEARCH
  • About Us
  • Blog
  • Contact Us
Search
Ask A Question

Mobile menu

Close
Ask a Question
  • Home
  • Add group
  • Groups page
  • Feed
  • User Profile
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Buy Points
  • Users
  • Help
  • Buy Theme
  • SEARCH
Home/ Questions/Q 7934905
In Process

The Archive Base Latest Questions

Editorial Team
  • 0
Editorial Team
Asked: June 3, 20262026-06-03T21:45:50+00:00 2026-06-03T21:45:50+00:00

I am new in machine learning and computing probabilities. This is an example from

  • 0

I am new in machine learning and computing probabilities. This is an example from Lingpipe for adding syllabification in a word by training data.

Given a source model p(h) for hyphenated words, and a channel model p(w|h) defined so that p(w|h) = 1 if w is equal to h with the hyphens removed and 0 otherwise. We then seek to find the most likely source message h to have produced message w by:

    ARGMAXh p(h|w) = ARGMAXh p(w|h) p(h) / p(w)
                   = ARGMAXh p(w|h) p(h)         
                   = ARGMAXh s.t. strip(h)=w p(h)

where we use strip(h) = w to mean that w is equal to h with the hyphenations stripped out (in Java terms, h.replaceAll(" ","").equals(w)). Thus with a deterministic channel, we wind up looking for the most likely hyphenation h according to p(h), restricting our search to h that produce w when the hyphens are stripped out. 

I do not understand how to use it to build a syllabification model.

If there is a training set containing:

a bid jan
a bide
a bie
a bil i ty
a bim e lech

How to have a model that will syllabify words? I mean what to be computed in order to find possible syllable breaks of a new word.

First compute what? then compute what? Can you please be specific with example?

Thanks a lot.

  • 1 1 Answer
  • 0 Views
  • 0 Followers
  • 0
Share
  • Facebook
  • Report

Leave an answer
Cancel reply

You must login to add an answer.

Forgot Password?

Need An Account, Sign Up Here

1 Answer

  • Voted
  • Oldest
  • Recent
  • Random
  1. Editorial Team
    Editorial Team
    2026-06-03T21:45:52+00:00Added an answer on June 3, 2026 at 9:45 pm

    The method described in the article is based on a statistical law allowing to compute the correct value observing a noisy value. In other words, non-syllabified word is noisy or incorrect, like picnic, and the goal is finding a probably correct value, which is pic-nic.

    Here is an excellent video lesson on very this topic (scroll to 1:25, but the whole set of lectures worth watching).

    This method is specifically useful for word delimiting, but some use it for syllabification as well. Chinese language has space delimiters only for logical constructs, but most words follow each other with no delimiters. However, each character is a syllable, no exception.

    There are other languages that have more complicated grammar. For instance, Thai has no spaces between the words, but each syllable may be constructed from several symbols, e.g. สวัสดี -> ส-วัส-ดี. Rule-based syllabification may be hard but possible.

    As per English, I would not bother with Markov chains and N-grams and instead just use several simple rules that give pretty good match ratio (not perfect, however):

    1. Two consonants between two vowels VCCV – split between them VC-CV as in cof-fee, pic-nic, except the “cluster consonant” that represents a single sound: meth-od, Ro-chester, hang-out
    2. Three or more consonants between the vowels VCCCV – split keeping the blends together as in mon-ster or child-ren (this seems the most difficult as you cannot avoid a dictionary)
    3. One consonant between two vowels VCV – split after the first vowel V-CV as in ba-con, a-rid
    4. The rule above also has an exception based on blends: cour-age, play-time
    5. Two vowels together VV – split between, except they represent a “cluster vowel”: po-em, but glacier, earl-ier

    I would start with the “main” rules first, and then cover them with “guard” rules preventing cluster vowels and consonants to be split. Also, there would be an obvious guard rule to prevent a single consonant to become a syllable. When done, I would have added another guard rule based on a dictionary.

    • 0
    • Reply
    • Share
      Share
      • Share on Facebook
      • Share on Twitter
      • Share on LinkedIn
      • Share on WhatsApp
      • Report

Sidebar

Related Questions

I've been tackling this for a while. I setup a completely new machine. I've
I have copied a project from my old XP machine to my new Win7
I'm implementing a new machine learning algorithm in Java that extracts a prototype datastructure
I'm fairly new at machine learning and text mining in general. It has come
Just migrated to a new machine and having issues. Note: This is a Windows
I am new to machine learning in python, therefore forgive my naive question. Is
I'm pretty new in the field of machine learning (even if I find it
Hey I am really new to the field of machine learning and recently started
I've just get a new machine and try to checkout, build and launch my
Ever since I installed emacs on a new machine I have seen an ugly

Explore

  • Home
  • Add group
  • Groups page
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Users
  • Help
  • SEARCH

Footer

© 2021 The Archive Base. All Rights Reserved
With Love by The Archive Base

Insert/edit link

Enter the destination URL

Or link to existing content

    No search term specified. Showing recent items. Search or use up and down arrow keys to select an item.