Sign Up

Sign Up to our social questions and Answers Engine to ask questions, answer people’s questions, and connect with other people.

Have an account? Sign In

Have an account? Sign In Now

Sign In

Login to our social questions & Answers Engine to ask questions answer people’s questions & connect with other people.

Sign Up Here

Forgot Password?

Don't have account, Sign Up Here

Forgot Password

Lost your password? Please enter your email address. You will receive a link and will create a new password via email.

Have an account? Sign In Now

You must login to ask a question.

Forgot Password?

Need An Account, Sign Up Here

Please briefly explain why you feel this question should be reported.

Please briefly explain why you feel this answer should be reported.

Please briefly explain why you feel this user should be reported.

Sign InSign Up

The Archive Base

The Archive Base Logo The Archive Base Logo

The Archive Base Navigation

  • Home
  • SEARCH
  • About Us
  • Blog
  • Contact Us
Search
Ask A Question

Mobile menu

Close
Ask a Question
  • Home
  • Add group
  • Groups page
  • Feed
  • User Profile
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Buy Points
  • Users
  • Help
  • Buy Theme
  • SEARCH
Home/ Questions/Q 3231932
In Process

The Archive Base Latest Questions

Editorial Team
  • 0
Editorial Team
Asked: May 17, 20262026-05-17T17:06:09+00:00 2026-05-17T17:06:09+00:00

I feel like this is a pretty common problem but I wasn’t really sure

  • 0

I feel like this is a pretty common problem but I wasn’t really sure what to search for.

I have a large file (so I don’t want to load it all into memory) that I need to parse control strings out of and then stream that data to another computer. I’m currently reading in the file in 1000 byte chunks.

So for example if I have a string that contains ASCII codes escaped with (‘$’ some number of digits ‘;’) and the data looked like this… “quick $33;brown $126;fox $a $12a”. The string going to the other computer would be “quick brown! ~fox $a $12a”.

In my current approach I have the following problems:

  • What happens when the control strings falls on a buffer boundary?
  • If the string is ‘$’ followed by anything but digits and a ‘;’ I want to ignore it. So I need to read ahead until the full control string is found.

I’m writing this in straight C so I don’t have streams to help me.

Would an alternating double buffer approach work and if so how does one manage the current locations etc.

  • 1 1 Answer
  • 0 Views
  • 0 Followers
  • 0
Share
  • Facebook
  • Report

Leave an answer
Cancel reply

You must login to add an answer.

Forgot Password?

Need An Account, Sign Up Here

1 Answer

  • Voted
  • Oldest
  • Recent
  • Random
  1. Editorial Team
    Editorial Team
    2026-05-17T17:06:09+00:00Added an answer on May 17, 2026 at 5:06 pm

    If I’ve followed what you are asking about it is called lexical analysis or tokenization or regular expressions. For regular languages you can construct a finite state machine which will recognize your input. In practice you can use a tool that understands regular expressions to recognize and perform different actions for the input.

    Depending on different requirements you might go about this differently. For more complicated languages you might want to use a tool like lex to help you generate an input processor, but for this, as I understand it, you can use a much more simple approach, after we fix your buffer problem.

    You should use a circular buffer for your input, so that indexing off the end wraps around to the front again. Whenever half of the data that the buffer can hold has been processed you should do another read to refill that. Your buffer size should be at least twice as large as the largest “word” you need to recognize. The indexing into this buffer will use the modulus (remainder) operator % to perform the wrapping (if you choose a buffer size that is a power of 2, such as 4096, then you can use bitwise & instead).

    Now you just look at the characters until you read a $, output what you’ve looked at up until that point, and then knowing that you are in a different state because you saw a $ you look at more characters until you see another character that ends the current state (the ;) and perform some other action on the data that you had read in. How to handle the case where the $ is seen without a well formatted number followed by an ; wasn’t entirely clear in your question — what to do if there are a million numbers before you see ;, for instance.

    The regular expressions would be:

     [^$]
    

    Any non-dollar sign character. This could be augmented with a closure ([^$]* or [^$]+) to recognize a string of non$ characters at a time, but that could get very long.

    $[0-9]{1,3};
    

    This would recognize a dollar sign followed by up 1 to 3 digits followed by a semicolon.

    [$]
    

    This would recognize just a dollar sign. It is in the brackets because $ is special in many regular expression representations when it is at the end of a symbol (which it is in this case) and means “match only if at the end of line”.

    Anyway, in this case it would recognize a dollar sign in the case where it is not recognized by the other, longer, pattern that recognizes dollar signs.

    In lex you might have

    [^$]{1,1024}          { write_string(yytext); }
    $[0-9]{1,3};          { write_char(atoi(yytext)); }
    [$]                   { write_char(*yytext); }
    

    and it would generate a .c file that will function as a filter similar to what you are asking for. You will need to read up a little more on how to use lex though.

    • 0
    • Reply
    • Share
      Share
      • Share on Facebook
      • Share on Twitter
      • Share on LinkedIn
      • Share on WhatsApp
      • Report

Sidebar

Related Questions

I feel like this is probably a pretty dumb question, but I am just
I feel pretty stupid for asking this, but I'm doing a form where the
I'd like to disable PowerShell's common parameters for one of my functions. I've been
I have a pretty reasonable amount of experience in PHP (around ~5 years of
I have a pretty well integrated OpenLayers map that I want to add photos
I have a Rails form that is being used for creating and editing a
I am making a 2d space game with many moving objects. I have already
I have a UIViewController that materializes its view in loadView (i.e. no nib). Per
I've built a simple~ish method that constructs an URL out of approximately 5 parts:

Explore

  • Home
  • Add group
  • Groups page
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Users
  • Help
  • SEARCH

Footer

© 2021 The Archive Base. All Rights Reserved
With Love by The Archive Base

Insert/edit link

Enter the destination URL

Or link to existing content

    No search term specified. Showing recent items. Search or use up and down arrow keys to select an item.