How can I find an email address inside HTML code with Nokogiri?
I supose I will need to use regex, but don’t know how.
Example code
<html>
<title>Example</title>
<body>
This is an example text.
example@example.com
</body>
</html>
There is an answer covering the case when there is a href to mail_to, but that is not my case. The email addresses are sometimes inside a link, but not always.
Thanks
If you’re just trying to parse the email address from a string that just so happens to be HTML, Nokogiri isn’t needed for this.
This isn’t a perfect solution though, as the RFC for what constitutes a ‘valid’ email address is very lenient. This means most regular expressions you come across (the above one included) do not account for edge case valid addresses. For example, according to the RFC
is a valid email address, but will not be matched by the above regular expressions as it stands.