Sign Up

Sign Up to our social questions and Answers Engine to ask questions, answer people’s questions, and connect with other people.

Have an account? Sign In

Have an account? Sign In Now

Sign In

Login to our social questions & Answers Engine to ask questions answer people’s questions & connect with other people.

Sign Up Here

Forgot Password?

Don't have account, Sign Up Here

Forgot Password

Lost your password? Please enter your email address. You will receive a link and will create a new password via email.

Have an account? Sign In Now

You must login to ask a question.

Forgot Password?

Need An Account, Sign Up Here

Please briefly explain why you feel this question should be reported.

Please briefly explain why you feel this answer should be reported.

Please briefly explain why you feel this user should be reported.

Sign InSign Up

The Archive Base

The Archive Base Logo The Archive Base Logo

The Archive Base Navigation

  • SEARCH
  • Home
  • About Us
  • Blog
  • Contact Us
Search
Ask A Question

Mobile menu

Close
Ask a Question
  • Home
  • Add group
  • Groups page
  • Feed
  • User Profile
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Buy Points
  • Users
  • Help
  • Buy Theme
  • SEARCH
Home/ Questions/Q 7727081
In Process

The Archive Base Latest Questions

Editorial Team
  • 0
Editorial Team
Asked: June 1, 20262026-06-01T05:22:12+00:00 2026-06-01T05:22:12+00:00

I have CSV files that have multiple columns that are sorted. For instance, I

  • 0

I have CSV files that have multiple columns that are sorted. For instance, I might have lines like this:

19980102,,PLXS,10032,Q,A,,,15.12500,15.00000,15.12500,2
19980105,,PLXS,10032,Q,A,,,14.93750,14.75000,14.93750,2
19980106,,PLXS,10032,Q,A,,,14.56250,14.56250,14.87500,2
20111222,,PCP,63830,N,A,,,164.07001,164.09000,164.12000,1
20111223,,PCP,63830,N,A,,,164.53000,164.53000,164.55000,1
20111227,,PCP,63830,N,A,,,165.69000,165.61000,165.64000,1

I would like to divide up the file based on the 3rd column, e.g. put PLXS and PCP entries into their own files called PLXS.csv and PCP.csv. Because the file happens to be pre-sorted, all of the PLXS entries are before the PCP entries and so on.

I generally end up doing things like this in C++ since that’s the language I know the best, but in this case, my input CSV file is several gigabytes and too large to load into memory in C++.

Can somebody show how this can be accomplished? Perl/Python/php/bash solutions are all okay, they just need to be able to handle the huge file without excessive memory usage.

  • 1 1 Answer
  • 0 Views
  • 0 Followers
  • 0
Share
  • Facebook
  • Report

Leave an answer
Cancel reply

You must login to add an answer.

Forgot Password?

Need An Account, Sign Up Here

1 Answer

  • Voted
  • Oldest
  • Recent
  • Random
  1. Editorial Team
    Editorial Team
    2026-06-01T05:22:14+00:00Added an answer on June 1, 2026 at 5:22 am

    C++ is fine if you know it best. Why would you try to load the entire file into memory anyways?

    Since the output is dependent upon the column being read you could easily store buffers for output files and stuff the record into the appropriate file as you process, cleaning as you go to keep the memory footprint relatively small.

    I do this (albeit in java) when needing to take massive extracts from a database. The records are pushed into a file buffer stream and anything in the memory is cleaned up so the footprint of the program never grows beyond what it initially starts out at.

    Fly by the seat of my pants pseudo-code:

    1. Create a list to hold your output file buffers
    2. Open stream on file and begin reading in the contents one line at a time
    3. Did we encounter a record that has an open file stream for it’s content type yet?
      • Yes –
        • Get the stored file stream
        • store the record into that file
        • flush the stream
      • No –
        • create a stream and save it to our list of streams
        • store the record on the stream
        • flush the stream
    4. Rinse repeat…

    Basically continuing this processing until we’re at the end of the file.

    Since we never store more than pointers to the streams and we’re flushing as soon as we write to the streams we don’t ever hold anything resident in the memory of the application other than one record from the input file. Thus the footprint is kept managable.

    • 0
    • Reply
    • Share
      Share
      • Share on Facebook
      • Share on Twitter
      • Share on LinkedIn
      • Share on WhatsApp
      • Report

Sidebar

Related Questions

I have a large number of csv files that look like this below: xxxxxxxx
I have several large csv files with thousands of columns that I need to
I have log files that look like this... 2009-12-18T08:25:22.983Z 1 174 dns:0-apr-credit-cards-uk.pedez.co.uk P http://0-apr-credit-cards-uk.pedez.co.uk/
I have a Excel CSV files with employee records in them. Something like this:
Right now, if you have a test that looks like this: [TestMethod] [DeploymentItem(DataSource.csv)] [DataSource(
I have a set of .csv files that I want to process. It would
I have to build a C# program that makes CSV files and puts long
I have .csv file that contain 2 columns delimited with , . file.csv word1,word2
I have a CSV file that is formatted like: 0.0023709,8.5752e-007,4.847e-008 and I would like
I have multiple CSV files which I need to parse in a loop to

Explore

  • Home
  • Add group
  • Groups page
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Users
  • Help
  • SEARCH

Footer

© 2021 The Archive Base. All Rights Reserved
With Love by The Archive Base

Insert/edit link

Enter the destination URL

Or link to existing content

    No search term specified. Showing recent items. Search or use up and down arrow keys to select an item.