I want to let the user choose and open multiple texts and perform a search for exact matches in the texts.
I want the encoding to be unicode.
If I search for “cat” I want it to find “cat”, “cat,”, “.cat” but not “catalogue”.
I don’t know how to let the user search for two words (“cat” OR “dog”) in all of the texts at the same time??????
Maybe I can use RE?
So far I have just made it possible for the user to insert the path to the directory containing the text files to search in. Now I want to let the user (raw_input) search for two words in all of the texts, and then print and save the results (e.g. “search_word_1” and “search_word_2” found in document1.txt, “search_word_2” found in document4.txt) in a separate document (search_words).
import re, os
path = raw_input("insert path to directory :")
ex_library = os.listdir(path)
search_words = open("sword.txt", "w") # File or maybe list to put in the results
thelist = []
for texts in ex_library:
f = os.path.join(path, texts)
text = open(f, "r")
textname = os.path.basename(texts)
print textname
for line in text.read():
text.close()
Regular expressions are appropriate tool in this case.
Pattern:
r'\bcat\b'\bmatches at a word boundary.Pattern:
r'\bcat\b|\bdog\b'To print
"filename: <words that are found in it>":Example:
Alternative solutions
To avoid reading the whole file in memory:
You could also print found words in the context they are found e.g., print lines with highlighted words: