Sign Up

Sign Up to our social questions and Answers Engine to ask questions, answer people’s questions, and connect with other people.

Have an account? Sign In

Have an account? Sign In Now

Sign In

Login to our social questions & Answers Engine to ask questions answer people’s questions & connect with other people.

Sign Up Here

Forgot Password?

Don't have account, Sign Up Here

Forgot Password

Lost your password? Please enter your email address. You will receive a link and will create a new password via email.

Have an account? Sign In Now

You must login to ask a question.

Forgot Password?

Need An Account, Sign Up Here

Please briefly explain why you feel this question should be reported.

Please briefly explain why you feel this answer should be reported.

Please briefly explain why you feel this user should be reported.

Sign InSign Up

The Archive Base

The Archive Base Logo The Archive Base Logo

The Archive Base Navigation

  • SEARCH
  • Home
  • About Us
  • Blog
  • Contact Us
Search
Ask A Question

Mobile menu

Close
Ask a Question
  • Home
  • Add group
  • Groups page
  • Feed
  • User Profile
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Buy Points
  • Users
  • Help
  • Buy Theme
  • SEARCH
Home/ Questions/Q 8034209
In Process

The Archive Base Latest Questions

Editorial Team
  • 0
Editorial Team
Asked: June 5, 20262026-06-05T01:56:18+00:00 2026-06-05T01:56:18+00:00

I am trying to develop a system for audio classification in java using mfcc

  • 0

I am trying to develop a system for audio classification in java using mfcc features and hidden markov models. I am following this research paper: http://acccn.net/cr569/Rstuff/keys/bathSoundMonitoring.pdf .

It describes the algorithm as follows:

Each sound file, corresponding to a sample of a sound event, was processed in
frames pre-emphasized and windowed by a Hamming window (25 ms) with an overlap
of 50%. A feature vector consisting of a 13-order MFCC characterized each
frame. We modeled each sound using a left-to-right six-state continuous-density
HMM without state skipping. Each HMM state was composed of two Gaussian mixture
components. After a model initialization stage was done, all the HMM models
were trained in three iterative cycles.

I already have the first part working which is the feature extraction from a sample sound. As a result I get a 2d array of doubles that consists of 13 columns for each row (each row represents a frame of the sound).
Now my problem is how to train the hmm using this data.

I am using the jahmm library. So far I have developed some sample code to have a general understanding how the library works.

/**Some sample data to act as the mfcc data. Here each line terminated by a new space
     * is one observation. I don't know whether each line should be one row from the mfcc values 
     * (representing one frame) or each line should be representing a set of features from one audio file.
     */
    String realSequences = "1.1;2.2;3.3;4.4;5.5;6.6;7.7;8.8;9.9;10.0;11.1;12.2;13.3;\n"
            + "1.1;2.2;3.3;4.4;5.5;6.6;7.7;8.8;9.9;10.0;11.1;12.2;13.3;\n"
            + "1.1;2.2;3.3;4.4;5.5;6.6;7.7;8.8;9.9;10.0;11.1;12.2;13.3;\n"
            + "1.1;2.2;3.3;4.4;5.5;6.6;7.7;8.8;9.9;10.0;11.1;12.2;13.3;\n"
            + "1.1;2.2;3.3;4.4;5.5;6.6;7.7;8.8;9.9;10.0;11.1;12.2;13.3;\n"
            + "1.1;2.2;3.3;4.4;5.5;6.6;7.7;8.8;9.9;10.0;11.1;12.2;13.3;\n"
            + "1.1;2.2;3.3;4.4;5.5;6.6;7.7;8.8;9.9;10.0;11.1;12.2;13.3;\n"
            + "1.1;2.2;3.3;4.4;5.5;6.6;7.7;8.8;9.9;10.0;11.1;12.2;13.3;\n"
            + "1.1;2.2;3.3;4.4;5.5;6.6;7.7;8.8;9.9;10.0;11.1;12.2;13.3;\n"
            + "1.1;2.2;3.3;4.4;5.5;6.6;7.7;8.8;9.9;10.0;11.1;12.2;13.3;\n"
            + "1.1;2.2;3.3;4.4;5.5;6.6;7.7;8.8;9.9;10.0;11.1;12.2;13.3;\n"
            + "1.1;2.2;3.3;4.4;5.5;6.6;7.7;8.8;9.9;10.0;11.1;12.2;13.3;\n"
            + "1.1;2.2;3.3;4.4;5.5;6.6;7.7;8.8;9.9;10.0;11.1;12.2;13.3;\n"
            + "1.1;2.2;3.3;4.4;5.5;6.6;7.7;8.8;9.9;10.0;11.1;12.2;13.3;\n"
            + "1.1;2.2;3.3;4.4;5.5;6.6;7.7;8.8;9.9;10.0;11.1;12.2;13.3;\n"
            + "1.1;2.2;3.3;4.4;5.5;6.6;7.7;8.8;9.9;10.0;11.1;12.2;13.3;\n"
            + "1.1;2.2;3.3;4.4;5.5;6.6;7.7;8.8;9.9;10.0;11.1;12.2;13.3;\n"
            + "1.1;2.2;3.3;4.4;5.5;6.6;7.7;8.8;9.9;10.0;11.1;12.2;13.3;\n"
            + "1.1;2.2;3.3;4.4;5.5;6.6;7.7;8.8;9.9;10.0;11.1;12.2;13.3;\n"
            + "1.1;2.2;3.3;4.4;5.5;6.6;7.7;8.8;9.9;10.0;11.1;12.2;13.3;\n"
            + "1.1;2.2;3.3;4.4;5.5;6.6;7.7;8.8;9.9;10.0;11.1;12.2;13.3;\n";


    /**
     * This is the reader class that reads the data and puts then in a relevant collection format
     * 
     */
    Reader reader = new StringReader(realSequences);
    List<? extends List<ObservationReal>> sequences =
            ObservationSequencesReader.readSequences(new ObservationRealReader(), reader);
    reader.close();


    /**
     * As the description states that each state is composed of two Gaussian mixture components.
     */
    OpdfGaussianMixtureFactory gMixtureFactory = new OpdfGaussianMixtureFactory(2);

    /**
     * The manual for jahmm says that KMeans learner is a good way to initialize the hmm. It has 6 states
     * and uses the two gaussian mixture models created above.
     */
    KMeansLearner<ObservationReal> kml = new KMeansLearner<ObservationReal>(6, gMixtureFactory, sequences);
    Hmm<ObservationReal> initHmm = kml.iterate();


    /*
     * As the papers states the hmm is trained in 3 iterative cycles.
     */
    BaumWelchLearner bwl = new BaumWelchLearner();
    Hmm<ObservationReal> learntHmm = null;
    for (int i = 0; i < 3; i++) {
        learntHmm = bwl.iterate(initHmm, sequences);
    }

My questions are:

Q1: In what format the mfcc data should be passed to train the hmm? (See comments by the realSeuqences line)

Q2: In speech recognition sometimes we need to train the system by repeating the same word lets say 10 times. Does it mean it trains one hmm with those 10 samples? If yes then how to train one hmm with different samples of the same sound. Or is it 10 separately trained hmm but labeled with that word?

Q3: How to compare two hmm models in terms of sound recognition. Is it better to use viterbi or Kullback Leibler Distance ?

  • 1 1 Answer
  • 0 Views
  • 0 Followers
  • 0
Share
  • Facebook
  • Report

Leave an answer
Cancel reply

You must login to add an answer.

Forgot Password?

Need An Account, Sign Up Here

1 Answer

  • Voted
  • Oldest
  • Recent
  • Random
  1. Editorial Team
    Editorial Team
    2026-06-05T01:56:20+00:00Added an answer on June 5, 2026 at 1:56 am

    Q1: In what format the mfcc data should be passed to train the hmm?
    (See comments by the realSeuqences line)

    The MFCC data must be represented as:

    List<? extends List<ObservationVector>> sequences
    

    It’s a list of data sequences. Each sequence corresponds to word sample and is a list of vectors, each vector represent a frame and contains 13 MFCC values.

    Q2: In speech recognition sometimes we need to train the system by
    repeating the same word lets say 10 times. Does it mean it trains one
    hmm with those 10 samples?

    Input data is a list of sequences for each word. This list is processed together.

    If yes then how to train one hmm with
    different samples of the same sound. Or is it 10 separately trained
    hmm but labeled with that word?

    It’s one HMM. The hmm training algorithm works with several samples of each word. It actually needs quite many samples, more than 10.

    Q3: How to compare two hmm models in terms of sound recognition. Is it
    better to use viterbi or Kullback Leibler Distance ?

    It’s not quite clear what do you mean by “compare” here. Do you want one HMM to have less state than the other or what. What property do you want to use to compare. Answer depends on that.

    And, it’s important to note that speech recognition HMM training has some specific (how to select number of states, which features to use, how to initialize HMM). For that reason for best performance it’s better to use a specialized toolkit like CMUSphinx (http://cmusphinx.sourceforge.net), not the generic one.

    • 0
    • Reply
    • Share
      Share
      • Share on Facebook
      • Share on Twitter
      • Share on LinkedIn
      • Share on WhatsApp
      • Report

Sidebar

Related Questions

I'm trying to develop a simple homework submission system with Java. I'm working on
I'm trying to develop a extension ( IExtension<OperationContext> ) for System.ServiceModel.ObjectContext using TDD. The
I'm trying to develop a system using MySQL and PHP. I'm using mysqli* function
I am trying to develop a util (using system-hook) for that works like an
I am trying to develop a system where this application allows user to book
I am trying to develop a Project Management System in my application following the
I'm trying to use this GLWidget thing to develop using OpenTK and GTK#, it
I'm trying to develop a system that will allow users to update local, offline
I am trying to develop a plugin system, which provides a interface to load
I am trying develop a basic referrer system to my Django website, system will

Explore

  • Home
  • Add group
  • Groups page
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Users
  • Help
  • SEARCH

Footer

© 2021 The Archive Base. All Rights Reserved
With Love by The Archive Base

Insert/edit link

Enter the destination URL

Or link to existing content

    No search term specified. Showing recent items. Search or use up and down arrow keys to select an item.