What am I doing wrong here? I am trying to extract from this “list”
ARTICLE 11 - Title AA
ARTICLE 22 Title BB
ARTICLE 33
ARTICLE 44 - Title DD
ARTICLE 55 Title EE
all the article numbers and the titles (if any) for each article.
The “-” is optional when title exists.
With this RegEx
(article)(\s*)([^\s]*)((\s*)(-)?(\s*)(.*))
I get only 4 items. The item 33 and 44 are considered one article only and this is I suppose just because “ARTICLE 33” has no title.
11|Title AA
22|Title BB
33|ARTICLE 44 - Title DD
55|Title EE
Please see the code here: http://jsfiddle.net/Z94wf/
EDIT
What I expect to get is this:
11|Title AA
22|Title BB
33|
44|Title DD
55|Title EE
Thanks
You second
\s*is matching the newline char on the 3rd line, so if you change to explicitly match only space and dash as followsyou get the desired result
http://jsfiddle.net/Z94wf/37/