I’m trying to match any bullet list in a free text document. Bullet lists are defined as any number or lowercase character preceeded by a word delimiter. So for example
1. item a
2. item b
I use the following code to find the bullets:
Pattern p1 = Pattern.compile("\\s[\\d][\\.\\)]\\s");
This works well as long as the bullet list consist of single digit items. However, as soon as I try multiple digit bullet lists, it won’t work (example 12. item c 13. item d) I tried altering the the pattern to
Pattern p1 = Pattern.compile("\\s[\\d]+[\\.\\)]\\s");
or
Pattern p1 = Pattern.compile("\\s[\\d]\\+[\\.\\)]\\s");
My interpretation of the regex language is that this would match any case where there are 1 or more digits preceding a “.”. But this doesn’t work.
Can anyone see what I’m doing wrong?
(your second version) should work, but you can simplify it:
However, it does expect whitespace before the digit (so it won’t match at the start of the string, for example). Perhaps a word boundary is useful here:
(FYI: Your third example was trying to match a literal
+after a single digit. That’s why it failed).