Sign Up

Sign Up to our social questions and Answers Engine to ask questions, answer people’s questions, and connect with other people.

Have an account? Sign In

Have an account? Sign In Now

Sign In

Login to our social questions & Answers Engine to ask questions answer people’s questions & connect with other people.

Sign Up Here

Forgot Password?

Don't have account, Sign Up Here

Forgot Password

Lost your password? Please enter your email address. You will receive a link and will create a new password via email.

Have an account? Sign In Now

You must login to ask a question.

Forgot Password?

Need An Account, Sign Up Here

Please briefly explain why you feel this question should be reported.

Please briefly explain why you feel this answer should be reported.

Please briefly explain why you feel this user should be reported.

Sign InSign Up

The Archive Base

The Archive Base Logo The Archive Base Logo

The Archive Base Navigation

  • Home
  • SEARCH
  • About Us
  • Blog
  • Contact Us
Search
Ask A Question

Mobile menu

Close
Ask a Question
  • Home
  • Add group
  • Groups page
  • Feed
  • User Profile
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Buy Points
  • Users
  • Help
  • Buy Theme
  • SEARCH
Home/ Questions/Q 7942949
In Process

The Archive Base Latest Questions

Editorial Team
  • 0
Editorial Team
Asked: June 4, 20262026-06-04T00:07:44+00:00 2026-06-04T00:07:44+00:00

I never really dealt with NLP but had an idea about NER which should

  • 0

I never really dealt with NLP but had an idea about NER which should NOT have worked and somehow DOES exceptionally well in one case. I do not understand why it works, why doesn’t it work or weather it can be extended.

The idea was to extract names of the main characters in a story through:

  1. Building a dictionary for each word
  2. Filling for each word a list with the words that appear right next to it in the text
  3. Finding for each word a word with the max correlation of lists (meaning that the words are used similarly in the text)
  4. Given that one name of a character in the story, the words that are used like it, should be as well (Bogus, that is what should not work but since I never dealt with NLP until this morning I started the day naive)

I ran the overly simple code (attached below) on Alice in Wonderland, which for “Alice” returns:

21 [‘Mouse’, ‘Latitude’, ‘William’, ‘Rabbit’, ‘Dodo’, ‘Gryphon’, ‘Crab’, ‘Queen’, ‘Duchess’, ‘Footman’, ‘Panther’, ‘Caterpillar’, ‘Hearts’, ‘King’, ‘Bill’, ‘Pigeon’, ‘Cat’, ‘Hatter’, ‘Hare’, ‘Turtle’, ‘Dormouse’]

Though it filters for upper case words (and receives “Alice” as the word to cluster around), originally there are ~500 upper case words, and it’s still pretty spot on as far as main characters goes.

It does not work that well with other characters and in other stories, though gives interesting results.

Any idea if this idea is usable, extendable or why does it work at all in this story for “Alice” ?

Thanks!

#English Name recognition
import re
import sys
import random
from string import upper

def mimic_dict(filename):
  dict = {}
  f = open(filename)
  text = f.read()
  f.close()
  prev = ""
  words = text.split()
  for word in words:
    m = re.search("\w+",word)
    if m == None:
      continue
    word = m.group()
    if not prev in dict:
      dict[prev] = [word]
    else :
      dict[prev] = dict[prev] + [word] 
    prev = word
  return dict

def main():
  if len(sys.argv) != 2:
    print 'usage: ./main.py file-to-read'
    sys.exit(1)

  dict = mimic_dict(sys.argv[1])
  upper = []
  for e in dict.keys():
    if len(e) > 1 and  e[0].isupper():
      upper.append(e)
  print len(upper),upper

  exclude = ["ME","Yes","English","Which","When","WOULD","ONE","THAT","That","Here","and","And","it","It","me"]
  exclude = [ x  for x in exclude if dict.has_key(x)] 
  for s in exclude :
    del dict[s]

  scores = {}
  for key1 in dict.keys():
    max = 0
    for key2 in dict.keys():
      if key1 == key2 : continue
      a =  dict[key1]
      k =  dict[key2]
      diff = []
      for ia in a:
        if ia in k and ia not in diff:
          diff.append( ia)
      if len(diff) > max:
        max = len(diff)
        scores[key1]=(key2,max)
  dictscores = {}
  names = []
  for e in scores.keys():
    if scores[e][0]=="Alice" and e[0].isupper():
      names.append(e)
  print len(names), names     


if __name__ == '__main__':
  main()
  • 1 1 Answer
  • 0 Views
  • 0 Followers
  • 0
Share
  • Facebook
  • Report

Leave an answer
Cancel reply

You must login to add an answer.

Forgot Password?

Need An Account, Sign Up Here

1 Answer

  • Voted
  • Oldest
  • Recent
  • Random
  1. Editorial Team
    Editorial Team
    2026-06-04T00:07:45+00:00Added an answer on June 4, 2026 at 12:07 am

    From the looks of your program and previous experience with NER, I’d say this “works” because you’re not doing a proper evaluation. You’ve found “Hare” where you should have found “March Hare”.

    The difficulty in NER (at least for English) is not finding the names; it’s detecting their full extent (the “March Hare” example); detecting them even at the start of a sentence, where all words are capitalized; classifying them as person/organisation/location/etc.

    Also, Alice in Wonderland, being a children’s novel, is a rather easy text to process. Newswire phrases like “Microsoft CEO Steve Ballmer” pose a much harder problem; here, you’d want to detect

    [ORG Microsoft] CEO [PER Steve Ballmer]
    
    • 0
    • Reply
    • Share
      Share
      • Share on Facebook
      • Share on Twitter
      • Share on LinkedIn
      • Share on WhatsApp
      • Report

Sidebar

Related Questions

I have never really done much C but am starting to play around with
I have a use case that I've never really dealt with before. I have
I have a rather complicated scenario that I have never really had to deal
I've never really been poised with this question: But would it be a terrible
I've never really done any charting or graphing in asp.net, but my current project
I've never really bothered with math programming, but today I've decided to give it
I have never really used regular expressions all that much and as such i
I've used MySQL (via PHPMyAdmin) a lot before but never really understood half of
Pardon my ignorance, but I've never really developed Windows applications. How do you store
I have been perpetually intrigued by test-driven development, but I can never follow through

Explore

  • Home
  • Add group
  • Groups page
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Users
  • Help
  • SEARCH

Footer

© 2021 The Archive Base. All Rights Reserved
With Love by The Archive Base

Insert/edit link

Enter the destination URL

Or link to existing content

    No search term specified. Showing recent items. Search or use up and down arrow keys to select an item.