Does anyone know of a good way to calculate the ‘semantic distance’ between two words?
Immediately an algorithm that counts the steps between words in a thesaurus springs to mind.
OK, looks like a similar question has already been answered: Is there an algorithm that tells the semantic similarity of two phrases.
The thesaurus idea has some merit. One idea would be to create a graph based on a thesaurus with the nodes being the words and an edge indicating that there they are listed as synonyms in the thesaurus. You could then use a shortest path algorithm to give you the distance between the nodes as a measure of their similarity.
One difficulty here is that some words have different meanings in different contexts. Your algorithm may need to take this into account and use directed links with the weight of the outgoing link dependent on the incoming link being followed (or ignore some outgoing links based on the incoming link).