Does anyone know the easiest way to extract only nouns from a body of

Question

0

Editorial Team

Asked: May 13, 20262026-05-13T01:55:11+00:00 2026-05-13T01:55:11+00:00

Does anyone know the easiest way to extract only nouns from a body of

0

Does anyone know the easiest way to extract only nouns from a body of text?

I’ve heard about the TreeTagger tool and I tried giving it a shot but couldn’t get it to work for some reason.

Any suggestions?

Thanks Phil

EDIT:

 import org.annolab.tt4j.*; 
TreeTaggerWrapper tt = new TreeTaggerWrapper(); 

try { tt.setModel("/Nouns/english.par"); 

tt.setHandler(new TokenHandler() { 
     void token(String token, String pos, String lemma) {    
     System.out.println(token+"\t"+pos+"\t"+lemma); } }); 
     tt.process(words); // words = list of words 

     } finally { tt.destroy(); 
}

That is my code, English is the language. I was getting the error : The type new TokenHandler(){} must implement the inherited abstract method TokenHandler.token. Am I doing something wrong?

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-05-13T01:55:12+00:00

First you will have to tokenize your text. This may seem trivial (split at any whitespace may work for you) but formally it is harder. Then you have to decide what is a noun. Does “the car park” contain one noun (car park), two nouns (car, park) or one noun (park) and one adjective (car)? This is a hard problem, but again you may be able to get by without it.

Does “I saw the xyzzy” identify a noun not in a dictionary? The word “the” probably identifies xyzzy as a noun.

Where are the nouns in “time flies like an arrow”. Compare with “fruit flies like a banana” (thanks to Groucho Marx).

We use the Brown tagger (Java) (http://en.wikipedia.org/wiki/Brown_Corpus) in the OpenNLP toolkit (opennlp.tools.lang.english.PosTagger; opennlp.tools.postag.POSDictionary on http://opennlp.sourceforge.net/) to find nouns in normal English and I’d recommend starting with that – it does most of your thinking for you. Otherwise look at any of the POSTaggers
(http://en.wikipedia.org/wiki/POS_tagger) or (http://www-nlp.stanford.edu/links/statnlp.html#Taggers).

In part-of-speech tagging by computer,
it is typical to distinguish from 50
to 150 separate parts of speech for
English, for example, NN for singular
common nouns, NNS for plural common
nouns, NP for singular proper nouns
(see the POS tags used in the Brown
Corpus)

There is a very full list of NLP toolkits in http://en.wikipedia.org/wiki/Natural_language_processing_toolkits. I would strongly suggest you use one of those rather than trying to match against Wordnet or other collections.

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

Does anyone know the easiest way to extract only nouns from a body of

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply