I’m trying to create a Regex javascript split, but I’m totally stuck. Here’s my input:
9:30 pm
The user did action A.
10:30 pm
Welcome, user John Doe.
***This is a comment
11:30 am
This is some more input.
I want the output array after the split() to be (I’ve removed the \n for readability):
["9:30 pm The user did action A.", "10:30 pm Welcome, user John Doe.", "***This is a comment", "11:30 am This is some more input." ];
My current regular expression is:
var split = text.split(/\s*(?=(\b\d+:\d+|\*\*\*))/);
This works, but there is one problem: the timestamps get repeated in extra elements. So I get:
["9:30", "9:30 pm The user did action A.", "10:30", "10:30 pm Welcome, user John Doe.", "***This is a comment", "11:30", "11:30 am This is some more input." ];
I cant split on the newlines \n because they aren’t consistent, and sometimes there may be no newlines at all.
Could you help me out with a Regex for this?
Thanks so much!!
EDIT: in reply to phleet
It could look like this:
9:30 pm
The user did action A.
He also did action B
10:30 pm Welcome, user John Doe.
Basically, there may or may not be a newline after the timestamp, and there may be multiple newlines for the event description.
I believe the issue is with regards to how Javascript’s
splittreats capturing groups. The solution may just be to use non-capturing group in your pattern. That is, instead of:Use
The
(?:___)is what is called a non-capturing group.Looking at the overall pattern, however, the grouping is not actually needed. You should be able to just use:
References
Minor point
Instead of
\*\*\*, you could use[*]{3}. This may be more readable. The*is not a meta-character inside a character class definition, so it doesn’t have to be escaped. The{3}is how you denote “exactly 3 repetition of”.References