If I have the text
<page a>The cat ran#$(*#(%)#over(*@#$the(*#%
and am using a scanner and the useDelimiter method, what regex would allow me to extract:
<page a>
The
cat
ran
over
the
So far I have tried:
s.useDelimiter("[^a-zA-Z]|^(<.*>$)");
but that does not leave angle brackets intact, it takes them out (obviously as it matches the a-zA-Z instead.
The problem is not one of delimiters, so much as it is one of token recognition. Your tokens are:
<page a>ThecatranovertheEncoding the “<” characters anywhere in the set of delimiters pretty much ensures that they won’t be in the returned tokens. If you know that the
<page a>occurs at the beginning of the string somewhere (and I realize that might be an invalid assumption), you can do something like this:Obviously, that’s a quick hack (though I did test it). But you could easily extend it, I think.