I’ve to implement text classification for a long list of words. I’ve some categories defined e.g. If the word “UK” is in the list, it will come under “Regions”. If the word is “Pizza”, it will come under category “food”.
How can I classify the words under different categories? Is there any open source tool available to do that?
I’m not entirely sure what you’re trying to do, but if what you want is to build up a list representative words for a number of categories then you could do this by selecting the top N most frequent words, excluding stop words, from a set of documents representative of each category. This is an easy way of creating a very basic ontology.
For example, to create a set of words about food you could crawl the web for recipies and menus and then select the most frequent words from these. I’d expect that once you have excluded stop words you’ll have a good list of food related words. For words related to programming you could crawl stackoverflow.com, etc etc…
Then again, this may not be what you’re trying to do…