I’m no expert in regex but I need to parse some input I have no control over, and make sure I filter away any strings that don’t have A-z and/or 0-9.
When I run this,
Pattern p = Pattern.compile("^[a-zA-Z0-9]*$"); //fixed typo
if(!p.matcher(gottenData).matches())
System.out.println(someData); //someData contains gottenData
certain spaces + an unknown symbol somehow slip through the filter (gottenData is the red rectangle):

In case you’re wondering, it DOES also display Text, it’s not all like that.
For now, I don’t mind the [?] as long as it also contains some string along with it.
Please help.
[EDIT] as far as I can tell from the (very large) input, the [?]’s are either white spaces either nothing at all; maybe there’s some sort of encoding issue, also perhaps something to do with #text nodes (input is xml)
The * quantifier matches “zero or more”, which means it will match a string that does not contain any of the characters in your class. Try the + quantifier, which means “One or more”:
^[a-zA-Z0-9]+$will match strings made up of alphanumeric characters only.^.*[a-zA-Z0-9]+.*$will match any string containing one or more alphanumeric characters, although the leading .* will make it much slower. If you useMatcher.lookingAt()instead ofMatcher.matches, it will not require a full string match and you can use the regex[a-zA-Z0-9]+.