It’s best to start with an example and what I’ve gotten so far.
Sample Data:
FOO foo@acme.com 5545
<Data><Name>tester</Name><Foo>bar</Foo></Data>
Current regex:
/FOO\s(.{1,20}@[^\s]+)\s.{0,20}\s{1,2}(<Data>.{0,100}<Name>(.{0,20})<\/Name>.{0,100}<\/Data>)?/m
Matches from regex:
- foo@acme.com
- testerbar
- tester
I’ve wrapped the <Data> section in parenthesis followed-by a ? because the entire data section may or may not exist. However, the <Name> section is also optional, it may or may not exist. So I tried putting parenthesis around <Name> with a question mark as well but then I don’t get the matches:
/FOO\s(.{1,20}@[^\s]+)\s.{0,20}\s{1,2}(<Data>.{0,100}(<Name>(.{0,20})<\/Name>)?.{0,100}<\/Data>)?/m
I’ve posted my regex and sample data on a regex site to make it easier to test/validate what I’m trying to do: http://www.rubular.com/r/ZhQzlNp1vv
In the <Data> section there is <Name> and even <Foo>. The point is, there may be many different elements in <Data> and I only care about extracting data from some of them. I need to use regex for my particular situation so please don’t suggest using some XML parsing library (thanks!).
Thanks in advance.
/FOO\s(\S+@\S+).*?\n(?:.{0,100}(.{0,20})</Name>.{0,100}</Data>)?/m
http://www.rubular.com/r/IhisH7HYJR