I want to be able to extract text from text files as tokens –

Question

0

Asked: June 1, 20262026-06-01T13:49:34+00:00 2026-06-01T13:49:34+00:00

I want to be able to extract text from text files as tokens –

0

I want to be able to extract text from text files as tokens – for example, say I have a text file that contains the sentence:

It’s a good restaurant,

believe me!

I want to extract the contents of this as ‘tokens’ – for example, one token would be “It’s”, the next token would be ” “, the one after that would be “a”, then ” “, then “good”, then “restaurant”, then “,” and “\n”, then “believe”, ” “, “me”, “!”. So I guess one way of putting it is that tokens are either words or not words.

Here is what I have so far (I check to see if the token is a word or not elsewhere in the program, this method just returns the next token):

public Token next() {
  if (c == -1) {
        throw new NoSuchElementException();
    }

  Writer sw=new CharArrayWriter();
  try {
        while ( c != -1 && Character.isLetter(c) ) {
            sw.write(c);
            c = r.read();
        }
        while ( c != -1 && !Character.isLetter(c)) {
            c = r.read();
        }
    } catch (IOException e) {
        c = -1;
        return null;
    }
    return null;
}

Right now I have the return values as ‘null’ because I’m not sure how to use the writer to export it as tokens. Does anyone have any tips for this? Thank you!

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-06-01T13:49:35+00:00

I guess that a solution using Matcher class could solve your issue.

Matcher m = Pattern.compile("\\p{Alpha}+|\\p{Digit}+|\\p{Punct}+|\\p{Space}+").matcher("It's a good restaurant, believe me!");
while(m.find())
    System.out.println(">"+m.group()+"<");

Maybe this regex could not be the right one, but you can build a better one. See the Pattern documentation in:

http://docs.oracle.com/javase/6/docs/api/java/util/regex/Pattern.html

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

I want to be able to extract text from text files as tokens –

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply