I am trying to implement porter stemming algorithm, but I stumbled at this point
where the square brackets denote
arbitrary presence of their contents.
Using (VC){m} to denote VC repeated m
times, this may again be written as[C](VC){m}[V].m will be called the \measure\ of any
word or word part when represented in
this form. The case m = 0 covers the
null word. Here are some examples:m=0 TR, EE, TREE, Y, BY. m=1 TROUBLE, OATS, TREES, IVY. m=2 TROUBLES, PRIVATE, OATEN, ORRERY.
I don’t understand what is this “measure” and what does it stand for?
Looks like the measure is the number of times a vowel is immediately followed by a consonant. For example,
“TROUBLES” has:
Optional initial consonants
[C]= “TR”.First vowels-consonants group
(VC)= “OUBL”.Second vowels-consonants group
(VC)= “ES”.Optional ending vowels
[V]is empty.So the measure is two, the number of times
(VC)was “matched”.