I have two lists. I need to determine which word from the first list

Question

0

Asked: June 11, 20262026-06-11T04:20:52+00:00 2026-06-11T04:20:52+00:00

I have two lists. I need to determine which word from the first list

0

I have two lists. I need to determine which word from the first list appears most frequently in the second list. The first, list1.txt contains a list of words, sorted alphabetically, with no duplicates. I have used some scripts which ensures that each word appears on a unique line, e.g.:

canyon
fish
forest
mountain
river

The second file, list2.txt is in UTF-8 and also contains many items. I have also used some scripts to ensure that each word appears on a unique line, but some items are not words, and some might appear many times, e.g.:

fish
canyon
ocean
ocean
ocean
ocean
1423
fish
109
fish
109
109
ocean

The script should output the most frequently matching item. For e.g., if run with the 2 files above, the output would be “fish”, because that word from list1.txt most often occurs in list2.txt.

Here is what I have so far. First, it searches for each word and creates a CSV file with the matches:

#!/bin/bash
while read -r line
do
    count=$(grep -c ^$line list2.txt)
    echo $line”,”$count >> found.csv
done < ./list1.txt

After that, found.csv is sorted descending by the second column. The output is the word appearing on the first line.
I do not think though, that this is a good script, because it is not so efficient, and it is possible that there might not be a most frequent matching item, for e.g.:

If there is a tie between 2 or more words, e.g. “fish”, “canyon”, and “forest” each appear 5 times, while no other appear as often, the output would be these 3 words in alphabetical order, separated by commas, e.g.: “canyon,fish,forest”.
If none of the words from list1.txt appears in list2.txt, then the output is simply the first word from the file list1.txt, e.g. “canyon”.

How can I create a more efficient script which finds which word from the first list appears most often in the second?

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-06-11T04:20:53+00:00

Editorial Team

2026-06-11T04:20:53+00:00Added an answer on June 11, 2026 at 4:20 am

You can use the following pipeline:

grep -Ff list1.txt list2.txt | sort | uniq -c | sort -n | tail -n1

F tells grep to search literal words, f tells it to use list1.txt as the list of words to search for. The rest sorts the matches, counts duplicates, and sorts them according to the number of occurrences. The last part selects the last line, i.e. the most common one (plus the number of occurrences).

0

Reply
Share
Share

- Report

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

I have two lists. I need to determine which word from the first list

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply