Sign Up

Sign Up to our social questions and Answers Engine to ask questions, answer people’s questions, and connect with other people.

Have an account? Sign In

Have an account? Sign In Now

Sign In

Login to our social questions & Answers Engine to ask questions answer people’s questions & connect with other people.

Sign Up Here

Forgot Password?

Don't have account, Sign Up Here

Forgot Password

Lost your password? Please enter your email address. You will receive a link and will create a new password via email.

Have an account? Sign In Now

You must login to ask a question.

Forgot Password?

Need An Account, Sign Up Here

Please briefly explain why you feel this question should be reported.

Please briefly explain why you feel this answer should be reported.

Please briefly explain why you feel this user should be reported.

Sign InSign Up

The Archive Base

The Archive Base Logo The Archive Base Logo

The Archive Base Navigation

  • Home
  • SEARCH
  • About Us
  • Blog
  • Contact Us
Search
Ask A Question

Mobile menu

Close
Ask a Question
  • Home
  • Add group
  • Groups page
  • Feed
  • User Profile
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Buy Points
  • Users
  • Help
  • Buy Theme
  • SEARCH
Home/ Questions/Q 8202859
In Process

The Archive Base Latest Questions

Editorial Team
  • 0
Editorial Team
Asked: June 7, 20262026-06-07T07:20:26+00:00 2026-06-07T07:20:26+00:00

I am building a 15k line training data document called: en-ner-person.train per the online

  • 0

I am building a 15k line training data document called: en-ner-person.train per the online manual (http://opennlp.apache.org/documentation/1.5.2-incubating/manual/opennlp.html).

My question is: in my training document, do I include an entire report? Or do I only include the lines which have a name: <START:person> John Smith <END>?

So for example do I use this entire report in my training data:

<START:person> Pierre Vinken <END> , 61 years old , will join the board as a nonexecutive director Nov. 29 .
A nonexecutive  director has many similar responsibilities as an executive director.
However, there are no voting rights with this position.
Mr . <START:person> Vinken <END> is chairman of Elsevier N.V. , the Dutch publishing group .

Or do I only include these two lines in my training document:

<START:person> Pierre Vinken <END> , 61 years old , will join the board as a nonexecutive director Nov. 29 .
Mr . <START:person> Vinken <END> is chairman of Elsevier N.V. , the Dutch publishing group .
  • 1 1 Answer
  • 0 Views
  • 0 Followers
  • 0
Share
  • Facebook
  • Report

Leave an answer
Cancel reply

You must login to add an answer.

Forgot Password?

Need An Account, Sign Up Here

1 Answer

  • Voted
  • Oldest
  • Recent
  • Random
  1. Editorial Team
    Editorial Team
    2026-06-07T07:20:28+00:00Added an answer on June 7, 2026 at 7:20 am

    You should use the entire report. This would help the system to learn when not to mark an entity, improving false negatives score.

    You can measure it using the evaluation tool. Reserve some sentences of your corpus for testing, for example 1/10 of the total, and train your model using the other 9/10 sentences. You can try training using the entire report and another with only the sentences with names. The results will be in terms of precision and recall.

    Remember to keep the test sample with the entire report, not only the sentences with names, otherwise you will not have an accurate measure of how the model would perform with sentences without names.

    • 0
    • Reply
    • Share
      Share
      • Share on Facebook
      • Share on Twitter
      • Share on LinkedIn
      • Share on WhatsApp
      • Report

Sidebar

Related Questions

Building a relatively simple website, and need to store some data in the database
I building an app dictionary, in this app data of dictionary is json and
I am currently experimenting with building an http server. The server is multi-threaded by
Building an overly fancy HTML mail for a client. Code validates at http://validator.w3.org as
Building a python fuse fs, in my readdir generator the first line of code
Building wxWidgets seems impossible. I followed all the steps from the documentation page http://wiki.codeblocks.org/index.php?title=WxWindowsQuickRef
Building a form for users to submit data. I can't seem to understand or
Building an inventory system. I have lots of products and each product has three
Building a new Mobile Web Platform for Mobile Users to purchase & download content
building a site using PHP and MySQL that needs to store a lot of

Explore

  • Home
  • Add group
  • Groups page
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Users
  • Help
  • SEARCH

Footer

© 2021 The Archive Base. All Rights Reserved
With Love by The Archive Base

Insert/edit link

Enter the destination URL

Or link to existing content

    No search term specified. Showing recent items. Search or use up and down arrow keys to select an item.