Sign Up

Sign Up to our social questions and Answers Engine to ask questions, answer people’s questions, and connect with other people.

Have an account? Sign In

Have an account? Sign In Now

Sign In

Login to our social questions & Answers Engine to ask questions answer people’s questions & connect with other people.

Sign Up Here

Forgot Password?

Don't have account, Sign Up Here

Forgot Password

Lost your password? Please enter your email address. You will receive a link and will create a new password via email.

Have an account? Sign In Now

You must login to ask a question.

Forgot Password?

Need An Account, Sign Up Here

Please briefly explain why you feel this question should be reported.

Please briefly explain why you feel this answer should be reported.

Please briefly explain why you feel this user should be reported.

Sign InSign Up

The Archive Base

The Archive Base Logo The Archive Base Logo

The Archive Base Navigation

  • Home
  • SEARCH
  • About Us
  • Blog
  • Contact Us
Search
Ask A Question

Mobile menu

Close
Ask a Question
  • Home
  • Add group
  • Groups page
  • Feed
  • User Profile
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Buy Points
  • Users
  • Help
  • Buy Theme
  • SEARCH
Home/ Questions/Q 106385
In Process

The Archive Base Latest Questions

Editorial Team
  • 0
Editorial Team
Asked: May 11, 20262026-05-11T01:35:18+00:00 2026-05-11T01:35:18+00:00

I am developing a system as an aid to musicians performing transcription. The aim

  • 0

I am developing a system as an aid to musicians performing transcription. The aim is to perform automatic music transcription (it does not have to be perfect, as the user will correct glitches / mistakes later) on a single instrument monophonic recording. Does anyone here have experience in automatic music transcription? Or digital signal processing in general? Help from anyone is greatly appreciated no matter what your background.

So far I have investigated the use of the Fast Fourier Transform for pitch detection, and a number of tests in both MATLAB and my own Java test programs have shown it to be fast and accurate enough for my needs. Another element of the task that will need to be tackled is the display of the produced MIDI data in sheet music form, but this is something I am not concerned with right now.

In brief, what I am looking for is a good method for note onset detection, i.e. the position in the signal where a new note begins. As slow onsets can be quite difficult to detect properly, I will initially be using the system with piano recordings. This is also partially due to the fact I play piano and should be in a better position to obtain suitable recordings for testing. As stated above, early versions of this system will be used for simple monophonic recordings, possibly progressing later to more complex input depending on progress made in the coming weeks.

  • 1 1 Answer
  • 0 Views
  • 0 Followers
  • 0
Share
  • Facebook
  • Report

Leave an answer
Cancel reply

You must login to add an answer.

Forgot Password?

Need An Account, Sign Up Here

1 Answer

  • Voted
  • Oldest
  • Recent
  • Random
  1. 2026-05-11T01:35:19+00:00Added an answer on May 11, 2026 at 1:35 am

    Here is a graphic that illustrates the threshold approach to note onset detection:

    alt text

    This image shows a typical WAV file with three discrete notes played in succession. The red line represents a chosen signal threshold, and the blue lines represent note start positions returned by a simple algorithm that marks a start when the signal level crosses the threshold.

    As the image shows, selecting a proper absolute threshold is difficult. In this case, the first note is picked up fine, the second note is missed completely, and the third note (barely) is started very late. In general, a low threshold causes you to pick up phantom notes, while raising it causes you to miss notes. One solution to this problem is to use a relative threshold that triggers a start if the signal increases by a certain percentage over a certain time, but this has problems of its own.

    A simpler solution is to use the somewhat-counterintuitively named compression (not MP3 compression – that’s something else entirely) on your wave file first. Compression essentially flattens the spikes in your audio data and then amplifies everything so that more of the audio is near the maximum values. The effect on the above sample would look like this (which shows why the name ‘compression’ appears to make no sense – on audio equipment it’s usually labelled ‘loudness’):

    alt text

    After compression, the absolute threshold approach will work much better (although it’s easy to over-compress and start picking up fictional note starts, the same effect as lowering the threshold). There are a lot of wave editors out there that do a good job of compression, and it’s better to let them handle this task – you’ll probably need to do a fair amount of work ‘cleaning up’ your wave files before detecting notes in them anyway.

    In coding terms, a WAV file loaded into memory is essentially just an array of two-byte integers, where 0 represents no signal and 32,767 and -32,768 represent the peaks. In its simplest form, a threshold detection algorithm would just start at the first sample and read through the array until it finds a value greater than the threshold.

    short threshold = 10000; for (int i = 0; i < samples.Length; i++) {     if ((short)Math.Abs(samples[i]) > threshold)      {         // here is one note onset point     } } 

    In practice this works horribly, since normal audio has all sorts of transient spikes above a given threshold. One solution is to use a running average signal strength (i.e. don’t mark a start until the average of the last n samples is above the threshold).

    short threshold = 10000; int window_length = 100; int running_total = 0; // tally up the first window_length samples for (int i = 0; i < window_length; i++) {     running_total += samples[i]; } // calculate moving average for (int i = window_length; i < samples.Length; i++) {     // remove oldest sample and add current     running_total -= samples[i - window_length];     running_total += samples[i];     short moving_average = running_total / window_length;     if (moving_average > threshold)     {         // here is one note onset point          int onset_point = i - (window_length / 2);     } } 

    All of this requires much tweaking and playing around with settings to get it to find the start positions of a WAV file accurately, and usually what works for one file will not work very well on another. This is a very difficult and not-perfectly-solved problem domain you’ve chosen, but I think it’s cool that you’re tackling it.

    Update: this graphic shows a detail of note detection I left out, namely detecting when the note ends:

    alt text

    The yellow line represents the off-threshold. Once the algorithm has detected a note start, it assumes the note continues until the running average signal strength drops below this value (shown here by the purple lines). This is, of course, another source of difficulties, as is the case where two or more notes overlap (polyphony).

    Once you’ve detected the start and stop points of each note, you can now analyze each slice of WAV file data to determine the pitches.

    Update 2: I just read your updated question. Pitch-detection through auto-correlation is much easier to implement than FFT if you’re writing your own from scratch, but if you’ve already checked out and used a pre-built FFT library, you’re better off using it for sure. Once you’ve identified the start and stop positions of each note (and included some padding at the beginning and end for the missed attack and release portions), you can now pull out each slice of audio data and pass it to an FFT function to determine the pitch.

    One important point here is not to use a slice of the compressed audio data, but rather to use a slice of the original, unmodified data. The compression process distorts the audio and may produce an inaccurate pitch reading.

    One last point about note attack times is that it may be less of a problem than you think. Often in music an instrument with a slow attack (like a soft synth) will begin a note earlier than a sharp attack instrument (like a piano) and both notes will sound as if they’re starting at the same time. If you’re playing instruments in this manner, the algorithm with pick up the same start time for both kinds of instruments, which is good from a WAV-to-MIDI perspective.

    Last update (I hope): Forget what I said about including some paddings samples from the early attack part of each note – I forgot this is actually a bad idea for pitch detection. The attack portions of many instruments (especially piano and other percussive-type instruments) contain transients that aren’t multiples of the fundamental pitch, and will tend to screw up pitch detection. You actually want to start each slice a little after the attack for this reason.

    Oh, and kind of important: the term ‘compression’ here does not refer to MP3-style compression.

    Update again: here is a simple function that does non-dynamic compression:

    public void StaticCompress(short[] samples, float param) {     for (int i = 0; i < samples.Length; i++)     {         int sign = (samples[i] < 0) ? -1 : 1;         float norm = ABS(samples[i] / 32768); // NOT short.MaxValue         norm = 1.0 - POW(1.0 - norm, param);         samples[i] = 32768 * norm * sign;     } } 

    When param = 1.0, this function will have no effect on the audio. Larger param values (2.0 is good, which will square the normalized difference between each sample and the max peak value) will produce more compression and a louder overall (but crappy) sound. Values under 1.0 will produce an expansion effect.

    One other probably obvious point: you should record the music in a small, non-echoic room since echoes are often picked up by this algorithm as phantom notes.

    Update: here is a version of StaticCompress that will compile in C# and explicity casts everything. This returns the expected result:

    public void StaticCompress(short[] samples, double param) {     for (int i = 0; i < samples.Length; i++)     {         Compress(ref samples[i], param);     } }  public void Compress(ref short orig, double param) {     double sign = 1;     if (orig < 0)     {         sign = -1;     }     // 32768 is max abs value of a short. best practice is to pre-     // normalize data or use peak value in place of 32768     double norm = Math.Abs((double)orig / 32768.0);     norm = 1.0 - Math.Pow(1.0 - norm, param);     orig = (short)(32768.0 * norm * sign); // should round before cast,         // but won't affect note onset detection } 

    Sorry, my knowledge score on Matlab is 0. If you posted another question on why your Matlab function doesn’t work as expected it would get answered (just not by me).

    • 0
    • Reply
    • Share
      Share
      • Share on Facebook
      • Share on Twitter
      • Share on LinkedIn
      • Share on WhatsApp
      • Report

Sidebar

Ask A Question

Stats

  • Questions 59k
  • Answers 59k
  • Best Answers 0
  • User 1
  • Popular
  • Answers
  • Editorial Team

    How to approach applying for a job at a company ...

    • 7 Answers
  • Editorial Team

    How to handle personal stress caused by utterly incompetent and ...

    • 5 Answers
  • Editorial Team

    What is a programmer’s life like?

    • 5 Answers
  • added an answer The options are slightly different. Option 1 allows you to… May 11, 2026 at 8:57 am
  • added an answer The error code and the description of the error states… May 11, 2026 at 8:57 am
  • added an answer Can I answer my own question? I believe I have… May 11, 2026 at 8:57 am

Related Questions

I am developing a system as an aid to musicians performing transcription. The aim
I am developing a PHP-based login system. Each user has an ID(a number) and
I am developing a custom control derived from System.Windows.Controls.ContentControl. In the controls default template
I am developing a Reporting Services solution for a DOD website. Frequently I'll have
I am developing a web site and need to see how it will look
I am developing a website that relies much on XML data. The web site
I am developing a GPL-licensed application in Python and need to know if the
I am developing a web page code, which fetches dynamically the content from the
I am developing a C++ class library containing domain model classes, and I would
I am developing a web app which requires a username and password to be

Trending Tags

analytics british company computer developers django employee employer english facebook french google interview javascript language life php programmer programs salary

Top Members

Explore

  • Home
  • Add group
  • Groups page
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Users
  • Help
  • SEARCH

Footer

© 2021 The Archive Base. All Rights Reserved
With Love by The Archive Base

Insert/edit link

Enter the destination URL

Or link to existing content

    No search term specified. Showing recent items. Search or use up and down arrow keys to select an item.