I have a log file containing search queries entered into my site’s search engine. I’d like to “group” related search queries together for a report. I’m using Python for most of my webapp – so the solution can either be Python based or I can load the strings into Postgres if it is easier to do this with SQL.
Example data:
dog food
good dog trainer
cat food
veterinarian
Groups should include:
cat:
cat food
dog:
dog food
good dog trainer
food:
dog food
cat food
etc…
Ideas? Some sort of “indexing algorithm” perhaps?
This could be heavily optimized, but this will print the following result, assuming you place the raw data in an external text file: