I installed Wordnet::Similarity and Wordnet::QueryData as an easy way to calculate information content score

Question

0

Asked: May 24, 20262026-05-24T21:50:14+00:00 2026-05-24T21:50:14+00:00

I installed Wordnet::Similarity and Wordnet::QueryData as an easy way to calculate information content score

0

I installed Wordnet::Similarity and Wordnet::QueryData as an easy way to calculate information content score and probability that comes with these modules. But I’m stuck at this basic problem: given a word, print n words similar to it – which should not be difficult that iterating through the synsets and doing join.

using the wn command and piping it with a whole lot of tr, sort | uniq I can get all the words:

 wn cat -synsn | grep -v Sense | tr '=' ' ' | tr '>' ' ' | tr '\t' ' ' | tr ',' '\n' | sort | uniq

OUTPUT

8 senses of cat                                                         
adult female
adult male
African tea
Arabian tea
big cat
bozo
cat
cat
CAT
Caterpillar
cat-o'-nine-tails
 computed axial tomography
computed tomography
computerized axial tomography
computerized tomography
CT
excitant
felid
      feline
      gossip
gossiper
gossipmonger
guy
hombre
kat
khat
      man
newsmonger
qat
quat
rumormonger
rumourmonger
      stimulant
stimulant drug
Synonyms/Hypernyms (Ordered by Estimated Frequency) of noun cat
      tracked vehicle
true cat
      whip
      woman
X-radiation
      X-raying

but its kinda nasty,and needs further clean up.

What my script looks like is below, and what I want to get is all the words in cat#n1…8.

SCRIPT

use WordNet::QueryData;

my $wn = WordNet::QueryData->new( noload => 1);

print "Senses: ", join(", ", $wn->querySense("cat#n")), "\n";
print "Synset: ", join(", ", $wn->querySense("cat", "syns")), "\n";
print "Hyponyms: ", join(", ", $wn->querySense("cat#n#1", "hypo")), "\n";

OUTPUT:

Senses: cat#n#1, cat#n#2, cat#n#3, cat#n#4, cat#n#5, cat#n#6, cat#n#7, cat#n#8
Synset: cat#n, cat#v
Hyponyms: domestic_cat#n#1, wildcat#n#3

SCRIPT

use WordNet::QueryData;
my $wn = WordNet::QueryData->new;

foreach $word (qw/cat#n/) {

    @senses = $wn->querySense($word);

    foreach $wps (@senses) {
            @gloss = $wn -> querySense($wps, "syns");
            print "$wps : @gloss\n";
    }

}

OUTPUT:

cat#n#1 : cat#n#1 true_cat#n#1
cat#n#2 : guy#n#1 cat#n#2 hombre#n#1 bozo#n#2
cat#n#3 : cat#n#3
cat#n#4 : kat#n#1 khat#n#1 qat#n#1 quat#n#1 cat#n#4 Arabian_tea#n#1 African_tea#n#1
cat#n#5 : cat-o'-nine-tails#n#1 cat#n#5
cat#n#6 : Caterpillar#n#2 cat#n#6
cat#n#7 : big_cat#n#1 cat#n#7
cat#n#8 : computerized_tomography#n#1 computed_tomography#n#1 CT#n#2 computerized_axial_tomography#n#1 computed_axial_tomography#n#1 CAT#n#8

P.S.
I have never written perl before, but have been looking into perl scripts since morning – and can now understand the basic stuff. Just need to know if there is cleaner way to do this using the api docs – couldn’t figure out from the api or usergroup archives.

Update:

I think I’ll settle with:

 wn cat -synsn | sed '1,6d' |sed 's/Sense [[:digit:]]//g' | sed 's/[[:space:]]*=> //' | sed '/^$/d'

sed rocks!

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-05-24T21:50:16+00:00

I think you’ll find the following hepful…

http://marimba.d.umn.edu/WordNet-Pairs/

What are the N most similar words to X, according to WordNet?

This data seeks to answer that question, where similarity is based on
measures from WordNet::Similarity. http://wn-similarity.sourceforge.net

————– verb data

These files were created with WordNet::Similarity version 2.05 using
WordNet 3.0. They show all the pairwise verb-verb similarities found
in WordNet according to the path, wup, lch, lin, res, and jcn measures.
The path, wup, and lch are path-based, while res, lin, and jcn are based
on information content.

As of March 15, 2011 pairwise measures for all verbs using the six
measures above are availble, each in their own .tar file. Each *.tar
file is named as WordNet-verb-verb-MEASURE-pairs.tar, and is approx
2.0 – 2.4 GB compressed. In each of these .tar files you will find
25,047 files, one for each verb sense. Each file consists of 25,048 lines,
where each line (except the first) contains a WordNet verb sense and the
similarity to the sense featured in that particular file. Doing
the math here, you find that each .tar file contains about 625,000,000
pairwise similarity values. Note that these are symmetric (sim (A,B)
= sim (B,A)) so you have a bit more than 300 million unique values.

————– noun data

As of August 19, 2011 pairwise measures for all nouns using the path
measure are available. This file is named WordNet-noun-noun-path-pairs.tar.
It is approximately 120 GB compressed. In this file you will find
146,312 files, one for each noun sense. Each file consists of
146,313 lines, where each line (except the first) contains a WordNet
noun sense and the similarity to the sense featured in that particular
file. Doing the math here, you find that each .tar file contains
about 21,000,000,000 pairwise similarity values. Note that these
are symmetric (sim (A,B) = sim (B,A)) so you have around 10 billion
unique values.

We are currently running wup, res, and lesk, but do not have an
estimated date of availability yet.

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

I installed Wordnet::Similarity and Wordnet::QueryData as an easy way to calculate information content score

Leave an answerCancel reply

1 Answer

————– verb data

————– noun data

Leave an answer
Cancel reply