Text file 1 has the following format: ‘WORD’: 1 ‘MULTIPLE WORDS’: 1 ‘WORD’: 2

Question

0

Asked: May 25, 20262026-05-25T00:25:09+00:00 2026-05-25T00:25:09+00:00

Text file 1 has the following format: ‘WORD’: 1 ‘MULTIPLE WORDS’: 1 ‘WORD’: 2

0

Text file 1 has the following format:

'WORD': 1
'MULTIPLE WORDS': 1
'WORD': 2

etc.

I.e., a word separated by a colon followed by a number.

Text file 2 has the following format:

'WORD'
'WORD'

etc.

I need to extract single words (i.e., only WORD not MULTIPLE WORDS) from File 1 and, if they match a word in File 2, return the word from File 1 along with its value.

I have some poorly functioning code:

def GetCounts(file1, file2):
    target_contents  = open(file1).readlines()  #file 1 as list--> 'WORD': n
    match_me_contents = open(file2).readlines()   #file 2 as list -> 'WORD'
    ls_stripped = [x.strip('\n') for x in match_me_contents]  #get rid of newlines

    match_me_as_regex= re.compile("|".join(ls_stripped))   

    for line in target_contents:
        first_column = line.split(':')[0]  #get the first item in line.split
        number = line.split(':')[1]   #get the number associated with the word
        if len(first_column.split()) == 1: #get single word, no multiple words 
            """ Does the word from target contents match the word
            from match_me contents?  If so, return the line from  
            target_contents"""
            if re.findall(match_me_as_regex, first_column):  
                print first_column, number

#OUTPUT: WORD, n
         WORD, n
         etc.

Because of the use of regex, the output is shotty. The code will return ‘asset, 2’, for example, since re.findall() will match ‘set’ from match_me. I need to match the target_word with the entire word from match_me to block the bad output resulting from partial regex matches.

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-05-25T00:25:10+00:00

Editorial Team

2026-05-25T00:25:10+00:00Added an answer on May 25, 2026 at 12:25 am

If file2 is not humongous, slurp them into a set:

file2=set(open("file2").read().split())
for line in open("file1"):
    if line.split(":")[0].strip("'") in file2:
        print line

0

Reply
Share
Share

- Report

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

Text file 1 has the following format: ‘WORD’: 1 ‘MULTIPLE WORDS’: 1 ‘WORD’: 2

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply