I have the following regular expression for data validation: lexer = /(?: (.{18}|(?:.)(?=\s\S{2,})|(?:[^\s+]\s){1,})\s (.{18}|(?:.)(?=\s\S{2,})|(?:[^\s+]\s){1,})\s

Question

0

Asked: May 22, 20262026-05-22T20:48:27+00:00 2026-05-22T20:48:27+00:00

I have the following regular expression for data validation: lexer = /(?: (.{18}|(?:.)(?=\s\S{2,})|(?:[^\s+]\s){1,})\s (.{18}|(?:.)(?=\s\S{2,})|(?:[^\s+]\s){1,})\s

0

I have the following regular expression for data validation:

lexer = /(?:
      (.{18}|(?:.*)(?=\s\S{2,})|(?:[^\s+]\s){1,})\s*
      (.{18}|(?:.*)(?=\s\S{2,})|(?:[^\s+]\s){1,})\s*
      (?:\s+([A-Za-z][A-Za-z0-9]{2}(?=\s))|(\s+))\s*
      (Z(?:RO[A-DHJ]|EQ[A-C]|HIB|PRO|PRP|RMA)|H(?:IB[2E]|ALB)|F(?:ER[2T]|LUP2|ST4Q))\s*
      (\S+)\s*
      (\S+)\s*
      (\S+)\s*
      (\S+)\s*
      (\S+)\s*
      (\S+)\s*
      (\S+)\s*
      (\S+)\s*
      (\S+)\s*
      (\S+)\s*
      (\S+)\s*
      (\S+)\s*
      (\S+)\s*
      (\S+)\s*
      (\S+)\s*
      (\s+\d{10}|\s+)\s*
      (\d{6})\s*
      (.*)(?=((?:\d{2}\/){2}\d{4}))\s*
      ((?:\d{2}\/){2}\d{4})\s*
      (\S+)
    )/x

The problem is that I have to iterate through a file with 10000 lines (average) performing the validation with the regular expression, resulting in a slow parsing application.

  filename = File.new(@file, "r")
  filename.each_line.with_index do |line, index|
    next if index < INFO_AT + 1

    lexer = /(?:
      (.{18}|(?:.*)(?=\s\S{2,})|(?:[^\s+]\s){1,})\s*
      (.{18}|(?:.*)(?=\s\S{2,})|(?:[^\s+]\s){1,})\s*
      (?:\s+([A-Za-z][A-Za-z0-9]{2}(?=\s))|(\s+))\s*
      (Z(?:RO[A-DHJ]|EQ[A-C]|HIB|PRO|PRP|RMA)|H(?:IB[2E]|ALB)|F(?:ER[2T]|LUP2|ST4Q))\s*
      (\S+)\s*
      (\S+)\s*
      (\S+)\s*
      (\S+)\s*
      (\S+)\s*
      (\S+)\s*
      (\S+)\s*
      (\S+)\s*
      (\S+)\s*
      (\S+)\s*
      (\S+)\s*
      (\S+)\s*
      (\S+)\s*
      (\S+)\s*
      (\S+)\s*
      (\s+\d{10}|\s+)\s*
      (\d{6})\s*
      (.*)(?=((?:\d{2}\/){2}\d{4}))\s*
      ((?:\d{2}\/){2}\d{4})\s*
      (\S+)
    )/x

    m = lexer.match(line)
    begin
      if (m) then ...

Edit
Here you can find some of the lines that I need to parse: File

Edit II
@Mike R

I’m parsing a file that contains 25 columns per line and each column might have it’s own way of validation. Either it could be whitespace or a full char-set.

That validation is required since I have to drop away the line that doesn’t match that kind of part.
Might not be necessary
It’s necessary

I don’t believe that the expression it’s badly constructed, the lookahead it’s used, maybe in the part that I repeated the code (I just don’t remembered the capturing group index \1…\n, if this is what you mean!) I also believe that catastrophic backtracking is happening here.

If you see the file, maybe you’ll understand why I’m doing this! Let’s put as an example the first column. I have to match a “Part Number” and I don’t have any rule of how to do this, examples:

123456789
1 555 989
0123456789123456789

Neither a simple \S+, or (\S+\s){1, } could solve this problem, Cause I won’t be guaranting data integrity.

Ty!

Any improvement, suggestion?

~ Eder Quiñones

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-05-22T20:48:28+00:00

Your file is a format with fixed-width fields. Ruby has a string method called unpack that is specifically for parsing this type of file.

field_widths = [19,41,14,11,11,11,11,11,11,11] #etc
field_pattern = "A#{fields.join('A')}"

Then in your line loop:

row = line.unpack(field_pattern)

Now you have an array (row) with the contents of each field. You can then apply a regex to each one for validation. This is faster, more manageable, and also allows for field-specific error messages.

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

I have the following regular expression for data validation: lexer = /(?: (.{18}|(?:.*)(?=\s\S{2,})|(?:[^\s+]\s){1,})\s* (.{18}|(?:.*)(?=\s\S{2,})|(?:[^\s+]\s){1,})\s*

Leave an answerCancel reply

1 Answer

I have the following regular expression for data validation: lexer = /(?: (.{18}|(?:.)(?=\s\S{2,})|(?:[^\s+]\s){1,})\s (.{18}|(?:.)(?=\s\S{2,})|(?:[^\s+]\s){1,})\s

Leave an answer
Cancel reply