I am parsing a file with long lines, whose tokens are white space delimited. Before handling most of the line, I want to check whether the n-th (for small n) token has some value. I’ll skip most of the lines, so really there’s no need to split most of the very long lines. Is there a quick way to do a lazy split in Perl or do I need to roll my own?
Share
You can provide a limit argument to the
splitoperator to make Perl stop splitting after a certain number of tokens have been generated.for example, will put everything after the 3rd whitespace-separated field in the 4th element of
@list. This is more efficient than doing a complete split when the expression has more than four fields.If you do this lazy split and decide that you need to process the line further, you will need to
splitthe line again. Depending on how long the lines are and how frequently you need to reprocess them, you could still come out ahead.Another approach may be to split a portion of the line you are interested in. For example, if the line contains many fields but you want to filter on the 4th field AND you are sure that the 4th field always occurs before the 100th byte on the line, saying
and occasionally splitting the expression twice may be more efficient than always splitting the full expression one time.