I have a scanner set up that is working on an InputStream.
I am using Scanner.nextLine() to advance to each line, then doing some regular expression work on each line.
I have a regular expression that is basically like [\w\p{Z}]+?[;\n\r] to pick up anything to the end of that line, or just ONE thing, if they are semi-colon delimited.
so if my InpustStream looks like
abcd;
xyz
It will pick up abcd;, but not xyz.
I think this is because scanner is consuming the newline character at the end of the line of text must be getting consumed somehow when the .nextLine() function is being called. Can anyone tell me how to fix this problem?
As an additional point of info, for my regex, i am compiling the pattern with Pattern.DOTALL
Thanks!
Actually, you’re the one that’s causing the problem, by trying to consume a newline at the end of the last line. :-/ It’s perfectly valid for the last line to end abruptly without a newline character, but your regex requires it to have one. You might be able to fix that by replacing the newline with an anchor or a lookahead, but there are much easier ways to go about this.
One is to override the default delimiter and iterate over the fields with
next():The other is to iterate over the lines with
nextLine()(using the default delimiter) and then split each line on semicolons:Scanner’s API is one of the most bloated and unintuitive I’ve ever worked with, but you can greatly reduce the pain of using it if you remember these two crucial points:
split()).nextXXX()methods without first calling the correspondinghasNextXXX()method.