I have large data files stored in S3 that I need to analyze. Each

Question

0

Asked: May 11, 20262026-05-11T11:04:06+00:00 2026-05-11T11:04:06+00:00

I have large data files stored in S3 that I need to analyze. Each

0

I have large data files stored in S3 that I need to analyze. Each batch consists of ~50 files, each of which can be analyzed independently.

I’d like to setup parallel downloads of the S3 data into the EC2 instance, and setup triggers that start the analysis process on each file that downloads.

Are there any libraries that handle an async download, trigger on complete model?

If not, I’m thinking of setting up multiple download processes with pyprocessing, each of which will download and analyze a single piece of the file. Does that sound reasonable or are there better alternatives?

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

score 0 · Answer 1 · 2026-05-11T11:04:07+00:00

2026-05-11T11:04:07+00:00Added an answer on May 11, 2026 at 11:04 am

Answering my own question, I ended up writing a simple modification to the Amazon S3 python library that lets you download the file in chunks or read it line by line. Available here.

0

Reply
Share
Share

- Report

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

I have large data files stored in S3 that I need to analyze. Each

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply