I am building a very basic result ranking algorithm, and one thing I’d like is a way to determine which words are generally more important in a given phrase. It doesn’t have to be exact, just general.
Obviously dropping any word under 4 letters, identifying names. But what other ways can I pick out the 3 most significant words in a sentence?
In the absence of any other information, it is fair to assume that important words are rare words. Count how many times each word appears in your set of documents. The words with the lowest counts are more important, while the words with the highest counts are less important (if not nearly useless).
Related reading: