Background Create a probability lexicon based on a CSV file of words and tallies.

Question

0

Asked: May 19, 20262026-05-19T23:21:11+00:00 2026-05-19T23:21:11+00:00

Background Create a probability lexicon based on a CSV file of words and tallies.

0

Background

Create a probability lexicon based on a CSV file of words and tallies. This is a prelude to a text segmentation problem, not a homework problem.

Problem

Given a CSV file with the following words and tallies:

aardvark,10
aardwolf,9
armadillo,9
platypus,5
zebra,1

Create a file with probabilities relative to the largest tally in the file:

aardvark,1
aardwolf,0.9
armadillo,0.9
platypus,0.5
zebra,0.1

Where, for example, aardvark,1 is calculated as aardvark,10/10 and platypus,0.5 is calculated as platypus,5/10.

Question

What is the most efficient way to implement a shell script to create the file of relative probabilities?

Constraints

Neither the words nor the numbers are in any order.
No major programming language (such as Perl, Ruby, Python, Java, C, Fortran, or Cobol).
Standard Unix tools such as awk, sed, or sort are welcome.
All probabilities must be relative to the highest probability in the file.
The words are unique, the numbers are not.
The tallies are natural numbers.

Thank you!

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-05-19T23:21:12+00:00

No need to read the file twice:

awk 'BEGIN {OFS = FS = ","} {a[$1] = $2} $2 > max {max=$2} END {for (w in a) print w, a[w]/max}' inputfile

If you need the output sorted by word:

awk ... | sort

or

awk 'BEGIN {OFS = FS = ","} {a[$1] = $2; ind[j++] = $1} $2 > max {max=$2} END {n = asort(ind); for (i=1; i<=n; i++) print ind[i], a[ind[i]]/max}' inputfile

If you need the output sorted by probability:

awk ... | sort -t, -k2,2n -k1,1

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

Background Create a probability lexicon based on a CSV file of words and tallies.

Background

Problem

Question

Constraints

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply