I’m trying to write a script to pull the word count of many files within a directory. I have it working fairly close to what I want, but there is one part that is throwing me off. The code so far is:
import glob
directory = "/Users/.../.../files/*"
output = "/Users/.../.../output.txt"
filepath = glob.glob(directory)
def wordCount(filepath):
for file in filepath:
name = file
fileO = open(file, 'r')
for line in fileO:
sentences = 0
sentences += line.count('.') + line.count('!') + line.count('?')
tempwords = line.split()
words = 0
words += len(tempwords)
outputO = open(output, "a")
outputO.write("Name: " + name + "\n" + "Words: " + str(words) + "\n")
wordCount(filepath)
This writes the word counts to a file named “output.txt” and gives me output that looks like this:
Name: /Users/..../..../files/Bush1989.02.9.txt
Words: 10
Name: /Users/..../..../files/Bush1989.02.9.txt
Words: 0
Name: /Users/..../..../files/Bush1989.02.9.txt
Words: 3
Name: /Users/..../..../files/Bush1989.02.9.txt
Words: 0
Name: /Users/..../..../files/Bush1989.02.9.txt
Words: 4821
And this repeats for each file in the directory. As you can see, it gives me multiple counts for each file. The files are formatted such as:
Address on Administration Goals Before a Joint Session of Congress
February 9, 1989
Mr. Speaker, Mr. President, and distinguished Members of the House and
Senate…
So, it seems that the script is giving me a count of each “part” of the file, such as the 10 words on the first line, 0 on the line break, 3 on the next, 0 on the next, and then the count for the body of the text.
What I’m looking for is a single count for each file. Any help/direction is appreciated.
The last two lines of your inner loop, which print out the filename and word count, should be part of the outer loop, not the inner loop – as it is, they’re being run once per line.
You’re also resetting the sentence and word counts for each line – these should be in the outer loop, before the start of the inner loop.
Here’s what your code should look like after the changes: