Sign Up

Sign Up to our social questions and Answers Engine to ask questions, answer people’s questions, and connect with other people.

Have an account? Sign In

Have an account? Sign In Now

Sign In

Login to our social questions & Answers Engine to ask questions answer people’s questions & connect with other people.

Sign Up Here

Forgot Password?

Don't have account, Sign Up Here

Forgot Password

Lost your password? Please enter your email address. You will receive a link and will create a new password via email.

Have an account? Sign In Now

You must login to ask a question.

Forgot Password?

Need An Account, Sign Up Here

Please briefly explain why you feel this question should be reported.

Please briefly explain why you feel this answer should be reported.

Please briefly explain why you feel this user should be reported.

Sign InSign Up

The Archive Base

The Archive Base Logo The Archive Base Logo

The Archive Base Navigation

  • SEARCH
  • Home
  • About Us
  • Blog
  • Contact Us
Search
Ask A Question

Mobile menu

Close
Ask a Question
  • Home
  • Add group
  • Groups page
  • Feed
  • User Profile
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Buy Points
  • Users
  • Help
  • Buy Theme
  • SEARCH
Home/ Questions/Q 7683883
In Process

The Archive Base Latest Questions

Editorial Team
  • 0
Editorial Team
Asked: May 31, 20262026-05-31T18:53:54+00:00 2026-05-31T18:53:54+00:00

I am building a lemmatizer in python. As I need it to run in

  • 0

I am building a lemmatizer in python. As I need it to run in realtime/process fairly large amount of data the processing speed
is of the essence.
Data: I have all possible suffixes that are linked to all wordtypes that they can be combined with. Additionally I have lemmaforms that are linked to both their wordtype(s) and lemma(s). The program takes a word as input and outputs its lemma.
word = lemmafrom + suffix

For example (Note: although the example is given in English I am not building a lemmatizer for English):

word: forbidding

lemmaform: forbidd

suffix: ing

lemma: forbid

My solution:

I have converted the data to (nested) dicts:

suffixdict : {suffix1:[type1,type2, ... , type(n)], suffix2:[type1,type2, ... ,
type(n)]}    
lemmaformdict : {lemmaform:{type1:lemma}}

1) Find all possible suffixes and word types that they are linked to.
If the longest possible suffix is 3 characters long, the program tries to match ‘ing’, ‘ng’, ‘n’ to the keys in
suffixdict. If the key exists it returns a value (a set of wordtypes).

2) For each matching suffix search the lemmaform from the dict.
If lemmaform exists it returns the wordtypes.

3) Finally, the program tries to intersect the wordtypes produced in steps 1) ans 2) and if the intersection is
sucessful it returns the lemma of the word.

My question: could there be a better solution to my problem from the prespective of speed? (Disregarding the option to keep frequent words and lemmas in the dictionary)
Help much appriciated.

  • 1 1 Answer
  • 0 Views
  • 0 Followers
  • 0
Share
  • Facebook
  • Report

Leave an answer
Cancel reply

You must login to add an answer.

Forgot Password?

Need An Account, Sign Up Here

1 Answer

  • Voted
  • Oldest
  • Recent
  • Random
  1. Editorial Team
    Editorial Team
    2026-05-31T18:53:55+00:00Added an answer on May 31, 2026 at 6:53 pm

    This would be a wonderful application for finite state transducers. Why? Because they allow you to do string rewriting efficiently (in time linear to the size of the input). Consider the following s[ia]mple transducer:

    enter image description here

    It takes a string as input and checks whether there exists a path from the initial state (here, 0) to a final state (10, 12 and 17, respectively) given the sequence of input characters. If it reaches a final state, it produces the appropriate output, e.g. (forbidd, ing) if the input was “forbidding”.

    I don’t know whether you have any background on finite state automata, though. If not, give them a try – it will be worth the effort. 🙂 Tries are a special kind of finite state automaton (the sample transducer above is a trie), so they might be a good start.

    • 0
    • Reply
    • Share
      Share
      • Share on Facebook
      • Share on Twitter
      • Share on LinkedIn
      • Share on WhatsApp
      • Report

Sidebar

Related Questions

Building a relatively simple website, and need to store some data in the database
Building a lottery scratch-card app in Java. Have allot of it done, just need
Building on How Do You Express Binary Literals in Python , I was thinking
Building my first SL MVVM application (Silverlight4 RC) and have some issues i don't
Building a commercial product may use various open source libraries that have use of
Building a website using a PHP based Ecommerce product Magento. The problem I have
building on my last question jQuery getting values from multiple selects together I have
Building my first WP theme and I quickly realized that I need to remove
Building a rails B2B application that will have various users. I'm pretty clear on
Building my baseclasses for user interface controls is getting there. I have command buttons

Explore

  • Home
  • Add group
  • Groups page
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Users
  • Help
  • SEARCH

Footer

© 2021 The Archive Base. All Rights Reserved
With Love by The Archive Base

Insert/edit link

Enter the destination URL

Or link to existing content

    No search term specified. Showing recent items. Search or use up and down arrow keys to select an item.