I am writing a program that counts the number of words in a text

Question

0

Asked: May 18, 20262026-05-18T21:14:23+00:00 2026-05-18T21:14:23+00:00

I am writing a program that counts the number of words in a text

0

I am writing a program that counts the number of words in a text file which is already in lowercase and separated by spaces. I want to use a dictionary and only count the word IF it’s within the dictionary. The problem is the dictionary is quite large (~100,000 words) and each text document has also ~50,000 words. As such, the codes that I wrote below gets very slow (takes about 15 sec to process one document on a quad i7 machine). I’m wondering if there’s something wrong with my coding and if the efficiency of the program can be improved. Thanks so much for your help. Code below:

public static string WordCount(string countInput)
        {
            string[] keywords = ReadDic(); /* read dictionary txt file*/

            /*then reads the main text file*/
            Dictionary<string, int> dict = ReadFile(countInput).Split(' ')
                .Select(c => c)
                .Where(c => keywords.Contains(c))
                .GroupBy(c => c)
                .Select(g => new { word = g.Key, count = g.Count() })
                .OrderBy(g => g.word)
                .ToDictionary(d => d.word, d => d.count);

            int s = dict.Sum(e => e.Value);
            string k = s.ToString();
            return k;

        }

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-05-18T21:14:24+00:00

You can vastly improve performance by reading the text file one line at a time instead of building an enormous string.

You can call

File.ReadLines(path).SelectMany(s => s.Split(' '))

Do not call ReadAllLines; it will need to build an enormous array.

Your first Select call is utterly useless.

Your Contains call will loop through the entire dictionary for each word in the file.
Thus, the Where call is an O(n²) operation.

Change keywords to a HashSet<string>.
Since HashSets can be searched in constant time, the Where call will become an O(n) operation, which is much better.

Your second Select call can be combined with the GroupBy, which will cut a large number of object allocations:

 .GroupBy(c => c, (word, set) => new { word, count = set.Count() })

Dictionaries are intrisically unordered, so your OrderBy call is a useless waste of time.

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

I am writing a program that counts the number of words in a text

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply