I would like to extract sub-string between certain two words using java.
For example:
This is an important example about regex for my work.
I would like to extract everything between “an” and “for“.
What I did so far is:
String sentence = "This is an important example about regex for my work and for me";
Pattern pattern = Pattern.compile("(?<=an).*.(?=for)");
Matcher matcher = pattern.matcher(sentence);
boolean found = false;
while (matcher.find()) {
System.out.println("I found the text: " + matcher.group().toString());
found = true;
}
if (!found) {
System.out.println("I didn't found the text");
}
It works well.
But I want to do two additional things
-
If the sentence is:
This is an important example about regex for my work and for me.
I want to extract till the first “for” i.e.important example about regex -
Some times I want to limit the number of words between the pattern to 3 words i.e.
important example about
Any ideas please?
For your first question, make it lazy. You can put a question mark after the quantifier and then the quantifier will match as less as possible.
I have no idea what the additional
.at the end is good for in.*.its unnecessary.For your second question you have to define what a “word” is. I would say here probably just a sequence of non whitespace followed by a whitespace. Something like this
and repeat this 3 times like this
To ensure that the pattern mathces on whole words use word boundaries
See it online here on Regexr
{3}will match exactly 3 for a minimum of 1 and a max of 3 do this{1,3}Alternative:
As dma_k correctly stated in your case here its not necessary to use look behind and look ahead. See here the Matcher documentation about groups
You can use capturing groups instead. Just put the part you want to extract in brackets and it will be put into a capturing group.
See it online here on Regexr
You can than access this group like this
You have only one pair of brackets, so its simple, just put a
1intomatcher.group(1)to access the first capturing group.