A quick, easy way to deal with that is by…

Question

0

Asked: May 11, 20262026-05-11T09:20:50+00:00 2026-05-11T09:20:50+00:00

Given this text: /* F004 (0309)00 / / field 1 / / field 2

0

Given this text:

     /* F004 (0309)00 */       /* field 1 */       /* field 2 */       /* F004 (0409)00 */       /* field 1 */       /* field 2 */

how do I parse it into this array:
[ ['F004'],['0309'],['/* field 1 */\n/* field 2 */'], ['F004'],['0409'],['/* field 1 */\n/* field 2 */'] ]

I got code working to parse the first two items:

form = /\/\*\s+(\w+)\s+\((\d{4})\)[0]{2}\s+\*\//m text.scan(form)

[ ['F004'],['0309'], ['F004'],['0409'] ]

And here’s the code where I try to parse all three and fail w/ an invalid regex error:

form = /\/\*\s+(\w+)\s+\((\d{4})\)[0]{2}\s+\*\//m form_and_fields = /#{form}(.[^#{form}]+)/m text.scan(form_and_fields)

edit: This is what ended up working for me, thanks to both rampion, & singpolyma:

form = /   \/\*\s+(\w+)\s+\((\d+)\)\d+\s+\*\/    #formId & edDate   (.+?)                                 #fieldText   (?=\/\*\s+\w+\s+\(\d+\)\d+\s+\*\/|\Z) #stop at beginning of next form                                         # or the end of the string /mx text.scan(form)

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

score 0 · Answer 1 · 2026-05-11T09:20:51+00:00

You seem to be misunderstanding how character classes (e.g. [a-f0-9], or [^aeiouy]) work. /[^abcd]/ doesn’t negate the pattern abcd, it says ‘match any character that’s not 'a' or 'b' or 'c' or 'd'‘.

If you want to match the negation of a pattern, use the /(?!pattern)/ construct. It’s a zero-width match – meaning it doesn’t actually match any characters, it matches a position. Similar to how /^/ and /$/ match the start and end of a string, or /\b/ matches the boundary of a word. For instance: /(?!xx)/ matches every position where the pattern ‘xx’ doesn’t start.

In general then, after you use a pattern negation, you need to match some character to move forward in the string.

So to use your pattern:

form = /\/\*\s+(\w+)\s+\((\d{4})\)[0]{2}\s+\*\//m form_and_fields = /#{form}((?:(?!#{form}).)+)/m text.scan(form_and_fields)

From the inside out (I’ll be using (?#comments))

(?!#{form}) negates your original pattern, so it matches any position where your original pattern can’t start.
(?:(?!#{form}).)+ means match one character after that, and try again, as many times as possible, but at least once. (?:(?#whatever)) is a non-capturing parentheses – good for grouping.

In irb, this gives:

irb> text.scan(form_and_fields) => [['F004', '0309', '  \n    /* field 1 */  \n    /* field 2 */  \n    ', nil, nil], ['F004', '0409', '  \n    /* field 1 */  \n    /* field 2 */  \n', nil, nil]]

The extra nils come from the capturing groups in form that are used in the negated pattern (?!#{form}) and therefore don’t capture anything on a successful match.

This could be cleaned up some:

form_and_fields = /#{form}\s*(.+?)\s*(?:(?=#{form})|\Z)/m text.scan(form_and_fields)

Now, instead of a zero-width negative lookahead, we use a zero-width positive lookahead (?=#{form}) to match the position of the next occurrence of form. So in this regex, we match everything until the next occurence of form (without including that next occurence in our match). This lets us trim out some whitespace around the fields. We also have to check for the case where we hit the end of the string – /\Z/, since that could happen too.

In irb:

irb> text.scan(form_and_fields) => [['F004', '0309', '/* field 1 */  \n    /* field 2 */', 'F004', '0409'], ['F004', '0409', '/* field 1 */  \n    /* field 2 */', nil, nil]]

Note now that the last two fields are populated the first time – b/c the capturing parens in the zero-width positive lookahead matched something, even though it wasn’t marked as ‘consumed’ during the process – which is why that bit could be rematched for the second time.

How to approach applying for a job at a company ...

What is a programmer’s life like?

How to handle personal stress caused by utterly incompetent and ...

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

Given this text: /* F004 (0309)00 */ /* field 1 */ /* field 2

Leave an answerCancel reply