I like to extract the words from the text. I have written the simple

Question

0

Asked: May 27, 20262026-05-27T03:20:48+00:00 2026-05-27T03:20:48+00:00

I like to extract the words from the text. I have written the simple

0

I like to extract the words from the text. I have written the simple regex.

my $regex = qr[\W];
while(<DATA>){
    push  @words, split $regex;
}

I like to modify it to include proper names. Proper names may combine multiple ‘words’. For example..

@names = ('John Smith', 'Joe Smith');

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-05-27T03:20:49+00:00

I don’t think there is a definitive solution. The regular expression is limited in a complex text space like a web page or book with many anomalies, e.g. what about book titles? Look at using either 1) natural language processing or 2) An index approach where you identify two words, starting with capital letter, split by one space, and see if one of them is contained with an index of known first or last names. good luck.

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

I like to extract the words from the text. I have written the simple

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply