I have a huge file that I must parse line by line. Speed is

Question

0

Editorial Team

Asked: May 10, 20262026-05-10T22:08:27+00:00 2026-05-10T22:08:27+00:00

I have a huge file that I must parse line by line. Speed is

0

I have a huge file that I must parse line by line. Speed is of the essence.

Example of a line:

Token-1   Here-is-the-Next-Token      Last-Token-on-Line       ^                        ^    Current                 Position    Position              after GetToken

GetToken is called, returning ‘Here-is-the-Next-Token’ and sets the CurrentPosition to the position of the last character of the token so that it is ready for the next call to GetToken. Tokens are separated by one or more spaces.

Assume the file is already in a StringList in memory. It fits in memory easily, say 200 MB.

I am worried only about the execution time for the parsing. What code will produce the absolute fastest execution in Delphi (Pascal)?

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

score 0 · Answer 1 · 2026-05-10T22:08:28+00:00

Use PChar incrementing for speed of processing
If some tokens are not needed, only copy token data on demand
Copy PChar to local variable when actually scanning through characters
Keep source data in a single buffer unless you must handle line by line, and even then, consider handling line processing as a separate token in the lexer recognizer
Consider processing a byte array buffer that has come straight from the file, if you definitely know the encoding; if using Delphi 2009, use PAnsiChar instead of PChar, unless of course you know the encoding is UTF16-LE.
If you know that the only whitespace is going to be #32 (ASCII space), or a similarly limited set of characters, there may be some clever bit manipulation hacks that can let you process 4 bytes at a time using Integer scanning. I wouldn’t expect big wins here though, and the code will be as clear as mud.

Here’s a sample lexer that should be pretty efficient, but it assumes that all source data is in a single string. Reworking it to handle buffers is moderately tricky due to very long tokens.

type   TLexer = class   private     FData: string;     FTokenStart: PChar;     FCurrPos: PChar;     function GetCurrentToken: string;   public     constructor Create(const AData: string);     function GetNextToken: Boolean;     property CurrentToken: string read GetCurrentToken;   end;  { TLexer }  constructor TLexer.Create(const AData: string); begin   FData := AData;   FCurrPos := PChar(FData); end;  function TLexer.GetCurrentToken: string; begin   SetString(Result, FTokenStart, FCurrPos - FTokenStart); end;  function TLexer.GetNextToken: Boolean; var   cp: PChar; begin   cp := FCurrPos; // copy to local to permit register allocation    // skip whitespace; this test could be converted to an unsigned int   // subtraction and compare for only a single branch   while (cp^ > #0) and (cp^ <= #32) do     Inc(cp);    // using null terminater for end of file   Result := cp^ <> #0;    if Result then   begin     FTokenStart := cp;     Inc(cp);     while cp^ > #32 do       Inc(cp);   end;    FCurrPos := cp; end;

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

I have a huge file that I must parse line by line. Speed is

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply