I am parsing a file with long lines, whose tokens are white space delimited.

Question

0

Asked: May 26, 20262026-05-26T09:24:54+00:00 2026-05-26T09:24:54+00:00

I am parsing a file with long lines, whose tokens are white space delimited.

0

I am parsing a file with long lines, whose tokens are white space delimited. Before handling most of the line, I want to check whether the n-th (for small n) token has some value. I’ll skip most of the lines, so really there’s no need to split most of the very long lines. Is there a quick way to do a lazy split in Perl or do I need to roll my own?

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-05-26T09:24:55+00:00

You can provide a limit argument to the split operator to make Perl stop splitting after a certain number of tokens have been generated.

@fields = split /\s+/, $expression, 4

for example, will put everything after the 3rd whitespace-separated field in the 4th element of @list. This is more efficient than doing a complete split when the expression has more than four fields.

If you do this lazy split and decide that you need to process the line further, you will need to split the line again. Depending on how long the lines are and how frequently you need to reprocess them, you could still come out ahead.

Another approach may be to split a portion of the line you are interested in. For example, if the line contains many fields but you want to filter on the 4th field AND you are sure that the 4th field always occurs before the 100th byte on the line, saying

@fields = split /\s+/, substr($expression, 0, 100);
if (matches_some_condition($line[3])) {
    # process the whole line
    @fields = split /\s+/, $expression;
    ...
}

and occasionally splitting the expression twice may be more efficient than always splitting the full expression one time.

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

I am parsing a file with long lines, whose tokens are white space delimited.

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply