I have the following string:
<script>m('02:29:1467301/>Sender1*>some text message?<<02:29:13625N1/>Sender2*>Recipient2: another message??<>A<<02:29:1393100=>User1*|0User2*|%></B><<','');</script>
N.B. messages are separated by <<
I need extract from message the following parts:
1. Time
2. Sender
3. Recipient
4. Text
Recipient may being defined or not, this field is optional.
I do this by the following pattern:
(?<message>(?<time>\d{1,2}:\d{1,2}:[0-9a-z]+)/>(?<messageData>(?<sender>.+?)\*>(.+?)))<<
But, I cannot extract recipient separately from the message text.
(?<message>(?<time>\d{1,2}:\d{1,2}:[0-9a-z]+)/>(?<messageData>(?<sender>.+?)\*>(((?<recipient>.+?):){0,1}(?<messageText>.+?))))<<
N.B. In the first message no recipient
Please help correct my pattern.
The
<recipient>group pattern needs to exclude<and:or else it will match the text between*>and the timestamp’s first colon when the recipient is omitted (as in the first message of your example).A simple tweak to that group pattern should fix it:
Note I replaced
{0,1}with the optional quantifier (?). It’s just shorthand to improve readability (a little goes a long way). 🙂Speaking of readability, here it is in multi-line form:
I don’t know if the unnamed group containing
<recipient>and<messageText>was intentional, but it’s unnecessary. You can break it down to this: