I need to make a program that prints out the frequency of letters in a text
file and compares that frequency with that of another in python.
So far I am able to print the number of times a letter occurs, but the
percentage frequency I get is wrong. I think it is because I need my program to count only
the number of letters in the file by removing all the spaces and other
characters.
def addLetter (x):
result = ord(x) - ord(a)
return result
#start of the main program
#prompt user for a file
while True:
speech = raw_input("Enter file name:")
wholeFile = open(speech, 'r+').read()
lowlet = wholeFile.lower()
letters= list(lowlet)
alpha = list('abcdefghijklmnopqrstuvwxyz')
n = len(letters)
f = float(n)
occurrences = {}
d = {}
#number of letters
for x in alpha:
occurrences[x] = letters.count(x)
d[x] =(occurrences[x])/f
for x in occurrences:
print x, occurrences[x], d[x]
This is the output
Enter file name:dems.txt
a 993 0.0687863674148
c 350 0.0242449431976
b 174 0.0120532003325
e 1406 0.0973954003879
d 430 0.0297866444999
g 219 0.015170407315
f 212 0.0146855084511
i 754 0.0522305347742
h 594 0.0411471321696
k 81 0.00561097256858
j 12 0.000831255195345
m 273 0.0189110556941
l 442 0.0306178996952
o 885 0.0613050706567
n 810 0.0561097256858
q 9 0.000623441396509
p 215 0.0148933222499
s 672 0.0465502909393
r 637 0.0441257966196
u 305 0.021127736215
t 1175 0.0813937378775
w 334 0.0231366029371
v 104 0.00720421169299
y 212 0.0146855084511
x 13 0.000900526461624
z 6 0.000415627597672
Enter file name:
The program does print in columns, but I’m not really sure how to display that here.
the frequency for “a” should be .0878
You could use the translator recipe to drop all characters not in
alpha.Since doing so makes
letterscontain nothing but characters fromalpha,nis now the correct denominator.You could then use a
collections.defaultdict(int)to count the occurrences of the letters: