I have about a 100 csv files each 100,000 x 40 rows columns. I’d

Question

0

Asked: May 13, 20262026-05-13T13:17:47+00:00 2026-05-13T13:17:47+00:00

I have about a 100 csv files each 100,000 x 40 rows columns. I’d

0

I have about a 100 csv files each 100,000 x 40 ~~rows~~ columns. I’d like to do some statistical analysis on it, pull out some sample data, plot general trends, do variance and R-square analysis, and plot some spectra diagrams. For now, I’m considering numpy for the analysis.

I was wondering what issues should I expect with such large files? I’ve already checked for erroneous data. What are your recommendations on doing statistical analysis? would it be better if I just split the files and do the whole thing in Excel?

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-05-13T13:17:47+00:00

I’ve found that Python + CSV is probably the fastest, and simplest way to do some kinds of statistical processing.

We do a fair amount of reformatting and correcting for odd data errors, so Python helps us.

The availability of Python’s functional programming features makes this particularly simple. You can do sampling with tools like this.

def someStatFunction( source ):
    for row in source:
        ...some processing...

def someFilterFunction( source ):
    for row in source:
        if someFunction( row ):
            yield row

# All rows
with open( "someFile", "rb" )  as source:
    rdr = csv.reader( source )
    someStatFunction( rdr )

# Filtered by someFilterFunction applied to each row
with open( "someFile", "rb" )  as source:
    rdr = csv.reader( source )
    someStatFunction( someFilterFunction( rdr ) )

I really like being able to compose more complex functions from simpler functions.

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

I have about a 100 csv files each 100,000 x 40 rows columns. I’d

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply