I am currently working on a script that runs through a document, pulls out all keywords, and then attempts to match these keywords with those found in other documents. There are some specifics that complicate this, but they are not very pertinent to me question. Basically I would like to be able to match words regardless of the tense in which they appear.
For example: If given the strings “swim”, “swam”, and “swimming”, I would like a program that can recognize that these are all the same word, though whether it would store the word as swim, swam or swimming doesn’t matter all that much to me.
I’m aware that this problem could be mostly solved with a dictionary containing all of these word forms, but I am unaware of any dictionary that is mapped in such a way to be useful for this. I would prefer a solution or library that is compatible with Python, since that is what I am currently using for this scripting, but I would be fine with a solution in just about any language (save haskell or eiffel or something similarly obscure/difficult to work with)
Check out pywordnet.
Edit: This library was discontinued in 2006 when it was merged into NLTK