I do data mining research and often have Python scripts that load large datasets

Question

0

Asked: June 17, 20262026-06-17T14:08:27+00:00 2026-06-17T14:08:27+00:00

I do data mining research and often have Python scripts that load large datasets

0

I do data mining research and often have Python scripts that load large datasets from SQLite databases, CSV files, pickle files, etc. In the development process, my scripts often need to be changed and I find myself waiting 20 to 30 seconds waiting for data to load.

Loading data streams (e.g. from a SQLite database) sometimes works, but not in all situations — if I need to go back into a dataset often, I’d rather pay the upfront time cost of loading the data.

My best solution so far is subsampling the data until I’m happy with my final script. Does anyone have a better solution/design practice?

My “ideal” solution would involve using the Python debugger (pdb) cleverly so that the data remains loaded in memory, I can edit my script, and then resume from a given point.

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-06-17T14:08:27+00:00

Editorial Team

2026-06-17T14:08:27+00:00Added an answer on June 17, 2026 at 2:08 pm

One way to do this would be to keep your loading and manipulation scripts in separate files X and Y and have X.py read

import Y
data = Y.load()
.... your code ....

When you’re coding X.py, you omit this part from the file and manually run it in an interactive shell. Then you can modify X.py and do an import X in the shell to test your code.

0

Reply
Share
Share

- Report

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

I do data mining research and often have Python scripts that load large datasets

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply