I am trying to figure out how to train the stanford LexicalizedParser ( edu.stanford.nlp.parser.lexparser.LexicalizedParser

Question

0

Asked: June 17, 20262026-06-17T17:45:26+00:00 2026-06-17T17:45:26+00:00

I am trying to figure out how to train the stanford LexicalizedParser ( edu.stanford.nlp.parser.lexparser.LexicalizedParser

0

I am trying to figure out how to train the stanford LexicalizedParser
( edu.stanford.nlp.parser.lexparser.LexicalizedParser ) to incorporate new nouns into its lexicon.

At first my goal was to take take an existing model and tweak it slightly, rather than creating a brand new model
from a vast set of training examples.

the answer to this question suggests that is not possible >
How can I add more tagged words to the Stanford POS-Tagger's trained models?

Hopefully someone out there can put me on the right track as to how to do this.

As a concrete example of what i want to do, say i have the word ‘researchgate’ which i want to be treated as a noun when i parse
sentences. Currently, ‘researchgate’ is getting treated as different parts of speech, depending on its
position.. but i want it identified as an ‘NN’ (noun).

Examples…

instead of this:

      (NP
        (NP (JJ recent) (NN activity))
        (PP (IN in)
          (NP (PRP$ your) (JJ researchgate) (NNS topics)))))

i want this:

      (NP
        (NP (JJ recent) (NN activity))
        (PP (IN in)
          (NP (PRP$ your) (NN researchgate) (NNS topics)))))

and instead of this:

    (ROOT
      (FRAG
        (NP (NN subscription))
        (S
          (VP (TO to)
            (VP (VB researchgate))))))

i want this:

    (ROOT
      (NP
        (NP (NN subscription))
        (PP (TO to)
          (NP (NN researchgate)))))

I am currently using this model: models/edu/stanford/nlp/models/lexparser/englishPCFG.ser.gz

I tried doing this >

    java -cp  stanford-parser.jar        
            edu.stanford.nlp.parser.lexparser.LexicalizedParser   -train  /tmp/train.txt

with the contensts of /tmp/train.txt as follows >

              (NP
                (NP (JJ recent) (NN activity))
                (PP (IN in)
                  (NP (PRP$ your) (JJ researchgate) (NNS topics)))))

I got a bunch of promising output, but then got this error >

    Error. Can't parse test sentence: [This, is, just, a, test, .]

So clearly i need to supply more examples than just the one i have in /tmp/train.txt.

Looking at the documentation there seems to be one promising method on
LexicalizedParser that I am considering trying… >

    public static LexicalizedParser getParserFromTreebank(Treebank trainTreebank,
                                                          Treebank secondaryTrainTreebank,
                                                          double weight,
                                                          GrammarCompactor compactor,
                                                          Options op,
                                                          Treebank tuneTreebank,
                                                          List<List<TaggedWord>> extraTaggedWords)

i am hesitant to jump in and try this because it seems tricky to get the Options right.
The doco says:
options to the parser which MUST be the SAME at both training and testing (parsing) time in
order for the parser to work properly

so i might need guidance on how to extract the options used for
edu/stanford/nlp/models/lexparser/englishPCFG.ser.gz perhaps it is

        edu.stanford.nlp.parser.lexparser.EnglishTreebankParserParams  ?

Also, maybe i want to add researchgate in as one of my extraTaggedWords ?

I have the feeling i am on the right track but was hoping to get some advice before descending
into a rat hole.

Thanks in advance !

chris

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-06-17T17:45:27+00:00

Editorial Team

2026-06-17T17:45:27+00:00Added an answer on June 17, 2026 at 5:45 pm

I posted to stanford parser mailing list and I received an answer from John Bauer (thanks, John !)

John Bauer
2:09 PM (39 minutes ago)
to me, parser-user
Unfortunately, you would need to start training from the beginning. There is no way to extend a current parser model.
That feature is on “the list”, but it’s somewhere near the back, so don’t hold your breath…
John

0

Reply
Share
Share

- Report

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

I am trying to figure out how to train the stanford LexicalizedParser ( edu.stanford.nlp.parser.lexparser.LexicalizedParser

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply