I have a file that I’m parsing that ALWAYS includes an email address. The file is currently laid out with a leading space before the @ and we want to capture the domain.
foo @bar.com more data here
foo @foo.com more data here
We want to pull out @bar.com and @foo.com and I’m just starting to work with regex. I’m trying to pull the pattern “@ at the start of a word boundary inclusive of all following characters up until the next word boundary“.
I’ve tried various iterations of the following, grouping things, square backets for the @ literal…but nothing seems to work.
EDIT – actual code :
import java.util.regex.*;
import java.io.*;
import java.nio.file.*;
import java.lang.*;
//
public class eadd
{
public static void main(String args[])
{
String inputLine = "foo foofoo foo foo @bar.com foofoofoo foo foo foo";
String eDomain = "";
// parse eadd
Pattern p2 = Pattern.compile("(\\b@.*\\b)");
Matcher m2 = p2.matcher(inputLine);
if(m2.matches()) {
eDomain = m2.group(1);
} else {
eDomain = "n/a";
}
System.out.println(p2+" "+m2+" "+eDomain);
}
}
And the results when I run it.
(\b@.*\b) java.util.regex.Matcher[pattern=(\b@.*\b) region=0,49 lastmatch=] n/a
All of my problems have been related to the what follows the @ being searched as a literal instead of a pattern (e.g., looking for .* rather than any and all characters). I can’t find references to @ being a control character, so I don’t think I need to escape out.
There are no similar examples in Oracle’s java tutorials or documentation, SO, nor any of the online resources I checked out; I’ve been unable to find other samples of how people have handled this. Like I said, I’m fairly new with regex, but this looks to me like it should be working to me. What am I missing?
Java won’t treat
@as a word character – thus there is no word boundary at the start of your address. You could replace the word boundary with a simple whitespace match:(Or
"\\s(@.+?)\\b"since this is Java) should do the trick. It looks for whitespace followed by@and matches until the next word boundary.Edit: Oops,
., just like@, isn’t a word character (duh). Useto match until the next whitespace or EOF.
(?:\\s|$)is a non-capturing group that will match any whitespace or end of input.