I’m making an application using a dependency tree parser. Actually, the parser is this

Question

0

Asked: June 9, 20262026-06-09T00:55:21+00:00 2026-06-09T00:55:21+00:00

I’m making an application using a dependency tree parser. Actually, the parser is this

0

I’m making an application using a dependency tree parser. Actually, the parser is this one:
Parser Stanford, but it rarely change one or two letters of some words in a sentence that I want to parse. This is a big trouble for me, because I can’t see any pattern in these changes and I need the dependency tree with the same words of my sentence.

All I can see is that just some words have these problems. I’m working with a tweets database. So, I have a lot of grammar mistakes in this data. For example the hashtag ‘#AllAmericanhumour ‘ becomes AllAmericanhumor. It misses one letter(u).

Is there anything I can do to solve this problem? In my first view I thought using an edit distance algorithm, but I think that might be an easier way to do it.

Thanks everybody in advance

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-06-09T00:55:22+00:00

You can give options to the tokenizer with the -tokenize.options flag/property. For this particular normalization, you can turn it off with

-tokenize.options americanize=false

There are also various other normalizations that you can turn off (see PTBTokenizer or http://nlp.stanford.edu/software/tokenizer.shtml. You can turn off a lot with

-tokenize.options ptb3Escaping=false

However, the parser is trained on data that looks like the output of ptb3Escaping=true and so will tend to degrade in performance if used with unnormalized tokens. So, you may want to consider alternative strategies.

If you’re working at the Java level, you can look at the word tokens, which are actually Maps, and they have various keys. OriginalTextAnnotation will give you the unnormalized token, even when it has been normalized. CharacterOffsetBeginAnnotation and CharacterOffsetEndAnnotation will map to character offsets into the text.

p.s. And you should accept some answers :-).

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

I’m making an application using a dependency tree parser. Actually, the parser is this

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply