Sign Up

Sign Up to our social questions and Answers Engine to ask questions, answer people’s questions, and connect with other people.

Have an account? Sign In

Have an account? Sign In Now

Sign In

Login to our social questions & Answers Engine to ask questions answer people’s questions & connect with other people.

Sign Up Here

Forgot Password?

Don't have account, Sign Up Here

Forgot Password

Lost your password? Please enter your email address. You will receive a link and will create a new password via email.

Have an account? Sign In Now

You must login to ask a question.

Forgot Password?

Need An Account, Sign Up Here

Please briefly explain why you feel this question should be reported.

Please briefly explain why you feel this answer should be reported.

Please briefly explain why you feel this user should be reported.

Sign InSign Up

The Archive Base

The Archive Base Logo The Archive Base Logo

The Archive Base Navigation

  • Home
  • SEARCH
  • About Us
  • Blog
  • Contact Us
Search
Ask A Question

Mobile menu

Close
Ask a Question
  • Home
  • Add group
  • Groups page
  • Feed
  • User Profile
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Buy Points
  • Users
  • Help
  • Buy Theme
  • SEARCH
Home/ Questions/Q 569449
In Process

The Archive Base Latest Questions

Editorial Team
  • 0
Editorial Team
Asked: May 13, 20262026-05-13T13:17:47+00:00 2026-05-13T13:17:47+00:00

I have about a 100 csv files each 100,000 x 40 rows columns. I’d

  • 0

I have about a 100 csv files each 100,000 x 40 rows columns. I’d like to do some statistical analysis on it, pull out some sample data, plot general trends, do variance and R-square analysis, and plot some spectra diagrams. For now, I’m considering numpy for the analysis.

I was wondering what issues should I expect with such large files? I’ve already checked for erroneous data. What are your recommendations on doing statistical analysis? would it be better if I just split the files and do the whole thing in Excel?

  • 1 1 Answer
  • 0 Views
  • 0 Followers
  • 0
Share
  • Facebook
  • Report

Leave an answer
Cancel reply

You must login to add an answer.

Forgot Password?

Need An Account, Sign Up Here

1 Answer

  • Voted
  • Oldest
  • Recent
  • Random
  1. Editorial Team
    Editorial Team
    2026-05-13T13:17:47+00:00Added an answer on May 13, 2026 at 1:17 pm

    I’ve found that Python + CSV is probably the fastest, and simplest way to do some kinds of statistical processing.

    We do a fair amount of reformatting and correcting for odd data errors, so Python helps us.

    The availability of Python’s functional programming features makes this particularly simple. You can do sampling with tools like this.

    def someStatFunction( source ):
        for row in source:
            ...some processing...
    
    def someFilterFunction( source ):
        for row in source:
            if someFunction( row ):
                yield row
    
    # All rows
    with open( "someFile", "rb" )  as source:
        rdr = csv.reader( source )
        someStatFunction( rdr )
    
    # Filtered by someFilterFunction applied to each row
    with open( "someFile", "rb" )  as source:
        rdr = csv.reader( source )
        someStatFunction( someFilterFunction( rdr ) )
    

    I really like being able to compose more complex functions from simpler functions.

    • 0
    • Reply
    • Share
      Share
      • Share on Facebook
      • Share on Twitter
      • Share on LinkedIn
      • Share on WhatsApp
      • Report

Sidebar

Related Questions

Ok so say I have 100 rows to insert and each row has about
I have about 150 000 rows of data written to a database everyday. These
I have 46 rows of information, 2 columns each row (Code Number, Description). These
I have about 100 sites built in a cms, each with its own database.
At my workplace we have one large Subversion repository which holds about 100 projects.
I have about 200 Excel files that are in standard Excel 2003 format. I
I have about 10million values that I need to put in some type of
So I have about 10 short css files that I use with mvc app.
I have a Server 2003 box with about 6 sites that each have about
I have about 100 posts in wordpress, all with a meta_key of price and

Explore

  • Home
  • Add group
  • Groups page
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Users
  • Help
  • SEARCH

Footer

© 2021 The Archive Base. All Rights Reserved
With Love by The Archive Base

Insert/edit link

Enter the destination URL

Or link to existing content

    No search term specified. Showing recent items. Search or use up and down arrow keys to select an item.