I have a list of strings which are all verbs. I need to get

Question

0

Asked: June 17, 20262026-06-17T14:37:52+00:00 2026-06-17T14:37:52+00:00

I have a list of strings which are all verbs. I need to get

0

I have a list of strings which are all verbs. I need to get the word frequencies for each verb, but I want to count verbs such as "want", "wants", "wanting" and "wanted" as one verb. Formally, a “verb” is defined as a set of 4 words that are of the form {X, Xs, Xed, Xing} or of the form {X, Xes, Xed, Xing} where X is the verb. How would I go about extracting verbs from the list such that I get "X" and a count of how many times the stem appears? I figured I could somehow use regex, however I’m new to regex and I am totally lost.

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-06-17T14:37:54+00:00

There is a library called nltk which has an insane array of functions for text processing. One of the subsets of functions are stemmers, which do just what you want (using algorithms/code developed by people with a lot of experience in the area). Here is the result using the Porter Stemming algorithm:

In [3]: import nltk

In [4]: verbs = ["want", "wants", "wanting", "wanted"]

In [5]: for verb in verbs:
   ...:     print nltk.stem.porter.PorterStemmer().stem_word(verb)
   ...:     
want
want
want
want

You could use this in conjunction with a defaultdict to do something like this (note: in Python 2.7+, a Counter would be equally useful/better):

In [2]: from collections import defaultdict

In [3]: from nltk.stem.porter import PorterStemmer

In [4]: verbs = ["want", "wants", "wanting", "wanted", "running", "runs", "run"]

In [5]: freq = defaultdict(int)

In [6]: for verb in verbs:
   ...:     freq[PorterStemmer().stem_word(verb)] += 1
   ...:     

In [7]: freq
Out[7]: defaultdict(<type 'int'>, {'run': 3, 'want': 4})

One thing to note: the stemmers aren’t perfect – for instance, adding ran to the above yields this as the result:

defaultdict(<type 'int'>, {'ran': 1, 'run': 3, 'want': 4})

However hopefully it will get you close to what you want.

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

I have a list of strings which are all verbs. I need to get

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply