I have a large text file (data) which has the following layout: a|b|c|d|e|f (although b can have pipes within it). I search the text file with the following code:
results = 0
Data_List = []
searchphrase = input("Search: ")
with open('data', 'r', encoding="utf8") as inF:
for line in inF:
if searchphrase in line:
a, *b, c, d, e, f = line.strip().split('|')
b = '|'.join(b)
results += 1
print("\n\n", results, "\n", "A: " + a + "\n", "B: " + b + "\n", "C: " + c + "\n", "D: " + d + "\n", "E: " + e + "\n", "F: " + f + "\n\n")
Data_List.append(f)
b is a piece of text that contains the title which is what the user is really searching for in the code above (for example: The Lion King). However, the search is very specific and only returns exact results (in other words, if I searched The Lion King then the lion king would not be returned). How can I make the search be less specific and more generalised (think Google searches)?
You may take a look at Whoosh.
It can handle:
An much more… Whoosh is pure-Python and Python 3 compatible.
Unfortunately fuzzy search is linked to NLP, one of the most complex subjects in CS, so it is not as easy as using some magic regular expression trick.
NLP is hard, period. That is why Google uses a Pigeon Cluster to rank results instead of computer algorithms (LoL).