Can’t seem to get my query tweaked just right; any help would be appreciated.
Here’s my query:
SELECT
wordlist.Word,
SUM( worddocfreq.Freq ) AS wordFreq
FROM sourceparsed
LEFT JOIN worddocfreq ON sourceparsed.ParsedID = worddocfreq.ParsedID
LEFT JOIN wordlist ON worddocfreq.WordID = wordlist.WordID
WHERE
sourceparsed.SrcID = 30032
GROUP BY
wordlist.Word
This works as expected, and as example result set I get two columns: the first is a list of distinct words, and the second is the frequency of each word.
However, I would rather adjust the query so that the second column is instead a proportion (i.e. sum of the number of occurences of each word divided by the total number of words). The total number of words would be given by the sum of the second column, as it is output from the query as it is written above.
So, my problem is that I’m not sure how to compute the sum of the total number of words, because the ‘group by’ at the end of the query retrospectively imposes that the sum is computed for each word. So, I don’t know how to divide my 2nd column by the sum calculated irrespective of the ‘group by’ term.
I have a feeling a nested select is required, but I’m not sure how to integrate this optimally.
Thanks in advance for any advice.
Cheers,
Brian
I’m not certain it’s the most efficient method, but give this a shot: