In Weka, class StringToWordVector defines a method called setNormalizeDocLength . It normalizes word frequencies

Question

0

Asked: June 10, 20262026-06-10T13:07:27+00:00 2026-06-10T13:07:27+00:00

In Weka, class StringToWordVector defines a method called setNormalizeDocLength . It normalizes word frequencies

0

In Weka, class StringToWordVector defines a method called setNormalizeDocLength. It normalizes word frequencies of a document. My questions are:

what is meant by “normalizing word frequency of a document”?
How Weka does this?

A practical example will help me best. Thanks in advance.

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-06-10T13:07:29+00:00

Looking in the Weka source, this is the method that does the normalising:

private void normalizeInstance(Instance inst, int firstCopy) throws Exception 
{
    double docLength = 0;

    if (m_AvgDocLength < 0) 
    {
        throw new Exception("Average document length not set.");
    }

    // Compute length of document vector
    for(int j=0; j<inst.numValues(); j++) 
    {
        if(inst.index(j)>=firstCopy) 
        {
            docLength += inst.valueSparse(j) * inst.valueSparse(j);
        }
    }     
    docLength = Math.sqrt(docLength);

    // Normalize document vector
    for(int j=0; j<inst.numValues(); j++) 
    {
        if(inst.index(j)>=firstCopy) 
        {
            double val = inst.valueSparse(j) * m_AvgDocLength / docLength;
            inst.setValueSparse(j, val);
            if (val == 0)
            {
                System.err.println("setting value "+inst.index(j)+" to zero.");
                j--;
            }
        }
    }
}

It looks like the most relevant part is

double val = inst.valueSparse(j) * m_AvgDocLength / docLength;
inst.setValueSparse(j, val);

So it looks like the normalisation is value = currentValue * averageDocumentLength / actualDocumentLength.

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

In Weka, class StringToWordVector defines a method called setNormalizeDocLength . It normalizes word frequencies

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply