Sign Up

Sign Up to our social questions and Answers Engine to ask questions, answer people’s questions, and connect with other people.

Have an account? Sign In

Have an account? Sign In Now

Sign In

Login to our social questions & Answers Engine to ask questions answer people’s questions & connect with other people.

Sign Up Here

Forgot Password?

Don't have account, Sign Up Here

Forgot Password

Lost your password? Please enter your email address. You will receive a link and will create a new password via email.

Have an account? Sign In Now

You must login to ask a question.

Forgot Password?

Need An Account, Sign Up Here

Please briefly explain why you feel this question should be reported.

Please briefly explain why you feel this answer should be reported.

Please briefly explain why you feel this user should be reported.

Sign InSign Up

The Archive Base

The Archive Base Logo The Archive Base Logo

The Archive Base Navigation

  • SEARCH
  • Home
  • About Us
  • Blog
  • Contact Us
Search
Ask A Question

Mobile menu

Close
Ask a Question
  • Home
  • Add group
  • Groups page
  • Feed
  • User Profile
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Buy Points
  • Users
  • Help
  • Buy Theme
  • SEARCH
Home/ Questions/Q 85767
In Process

The Archive Base Latest Questions

Editorial Team
  • 0
Editorial Team
Asked: May 10, 20262026-05-10T22:08:27+00:00 2026-05-10T22:08:27+00:00

I have a huge file that I must parse line by line. Speed is

  • 0

I have a huge file that I must parse line by line. Speed is of the essence.

Example of a line:

Token-1   Here-is-the-Next-Token      Last-Token-on-Line       ^                        ^    Current                 Position    Position              after GetToken 

GetToken is called, returning ‘Here-is-the-Next-Token’ and sets the CurrentPosition to the position of the last character of the token so that it is ready for the next call to GetToken. Tokens are separated by one or more spaces.

Assume the file is already in a StringList in memory. It fits in memory easily, say 200 MB.

I am worried only about the execution time for the parsing. What code will produce the absolute fastest execution in Delphi (Pascal)?

  • 1 1 Answer
  • 0 Views
  • 0 Followers
  • 0
Share
  • Facebook
  • Report

Leave an answer
Cancel reply

You must login to add an answer.

Forgot Password?

Need An Account, Sign Up Here

1 Answer

  • Voted
  • Oldest
  • Recent
  • Random
  1. 2026-05-10T22:08:28+00:00Added an answer on May 10, 2026 at 10:08 pm
    • Use PChar incrementing for speed of processing
    • If some tokens are not needed, only copy token data on demand
    • Copy PChar to local variable when actually scanning through characters
    • Keep source data in a single buffer unless you must handle line by line, and even then, consider handling line processing as a separate token in the lexer recognizer
    • Consider processing a byte array buffer that has come straight from the file, if you definitely know the encoding; if using Delphi 2009, use PAnsiChar instead of PChar, unless of course you know the encoding is UTF16-LE.
    • If you know that the only whitespace is going to be #32 (ASCII space), or a similarly limited set of characters, there may be some clever bit manipulation hacks that can let you process 4 bytes at a time using Integer scanning. I wouldn’t expect big wins here though, and the code will be as clear as mud.

    Here’s a sample lexer that should be pretty efficient, but it assumes that all source data is in a single string. Reworking it to handle buffers is moderately tricky due to very long tokens.

    type   TLexer = class   private     FData: string;     FTokenStart: PChar;     FCurrPos: PChar;     function GetCurrentToken: string;   public     constructor Create(const AData: string);     function GetNextToken: Boolean;     property CurrentToken: string read GetCurrentToken;   end;  { TLexer }  constructor TLexer.Create(const AData: string); begin   FData := AData;   FCurrPos := PChar(FData); end;  function TLexer.GetCurrentToken: string; begin   SetString(Result, FTokenStart, FCurrPos - FTokenStart); end;  function TLexer.GetNextToken: Boolean; var   cp: PChar; begin   cp := FCurrPos; // copy to local to permit register allocation    // skip whitespace; this test could be converted to an unsigned int   // subtraction and compare for only a single branch   while (cp^ > #0) and (cp^ <= #32) do     Inc(cp);    // using null terminater for end of file   Result := cp^ <> #0;    if Result then   begin     FTokenStart := cp;     Inc(cp);     while cp^ > #32 do       Inc(cp);   end;    FCurrPos := cp; end; 
    • 0
    • Reply
    • Share
      Share
      • Share on Facebook
      • Share on Twitter
      • Share on LinkedIn
      • Share on WhatsApp
      • Report

Sidebar

Related Questions

I have a program that reads a huge text file (line by line) and
I have a HUGE file with a lot of HL7 segments. It must be
I must process some huge file with gawk. My main problem is that I
I have a huge file, where I have to insert certain characters at a
I have a huge text file (~1GB) and sadly the text editor I use
I have a huge mbox file, with maybe 500 emails in it. It looks
I have a huge tab separated file which I want to sort on its
I have a huge web app that is having issues with memory leak in
I got some huge files I need to parse, and people have been recommending
a. i have a huge SVG file 2. i have objects in the SVG

Explore

  • Home
  • Add group
  • Groups page
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Users
  • Help
  • SEARCH

Footer

© 2021 The Archive Base. All Rights Reserved
With Love by The Archive Base

Insert/edit link

Enter the destination URL

Or link to existing content

    No search term specified. Showing recent items. Search or use up and down arrow keys to select an item.