I need to create a simple text file based search engine asap (using PHP)! Basically it has to read files in a directory, remove stop and useless words, index each remaining useful word with how many times it appears in each document.
I guess the pseudo code for this is:
for each file in directory:
read in contents,
compare to stop words,
add each remaining word to array,
count how many times that word appears in document,
add that number to the array,
add the id/name of the file to the array,
also need to count the total amount of words (after useless removal i guess) in the whole file, which im guessing can be done afterwards as long as i can get the file id from that array and then count the words inside….?
Can anyone help, maybe provide a barebones structure? I think the main bit i need help with is getting the number of times each word appears in the document and adding it to the index array…
Thanks
1 Answer