I work as a support technician, which means I'm frequently…

Question

0

Asked: May 10, 20262026-05-10T20:40:24+00:00 2026-05-10T20:40:24+00:00

Let’s say you have a text file like this one: http://www.gutenberg.org/files/17921/17921-8.txt Does anyone has

0

Let’s say you have a text file like this one: http://www.gutenberg.org/files/17921/17921-8.txt

Does anyone has a good algorithm, or open-source code, to extract words from a text file? How to get all the words, while avoiding special characters, and keeping things like ‘it’s’, etc…

I’m working in Java. Thanks

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

score 0 · Answer 1 · 2026-05-10T20:40:25+00:00

This sounds like the right job for regular expressions. Here is some Java code to give you an idea, in case you don’t know how to start:

String input = 'Input text, with words, punctuation, etc. Well, it's rather short.'; Pattern p = Pattern.compile('[\\w']+'); Matcher m = p.matcher(input);  while ( m.find() ) {     System.out.println(input.substring(m.start(), m.end())); }

The pattern [\w']+ matches all word characters, and the apostrophe, multiple times. The example string would be printed word-by-word. Have a look at the Java Pattern class documentation to read more.

How to approach applying for a job at a company ...

How to handle personal stress caused by utterly incompetent and ...

What is a programmer’s life like?

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions