I have a spreadsheet with about 1.7m lines, totalling 1 GB, and need to

Question

0

Asked: May 31, 20262026-05-31T12:38:40+00:00 2026-05-31T12:38:40+00:00

I have a spreadsheet with about 1.7m lines, totalling 1 GB, and need to

0

I have a spreadsheet with about 1.7m lines, totalling 1 GB, and need to perform queries on it.

Being most comfortable with Python, my first approach was to hack together a bunch of dictionaries keyed in a way that would facilitate the queries I was trying to make. E.g. if I needed to access everyone with a particular area code and age I would make an areacode_age 2-dimensional dictionary. I ended up needing quite a few which multiplied memory footprint to ~10GB, and even though I had enough RAM the process was slow.

I imported sqlite3 and imported my data into an in-memory database. Turns out doing a query like SELECT (a, b, c) FROM foo WHERE date1<=d AND date2>e AND name=f takes 0.05 seconds. Doing this for my 1.7m rows would take 24 hours. My hacky approach with dictionaries was about 3 orders of magnitude faster for this particular task (and, in this example, I couldn’t key on date1 and date2, so I was getting every row that matched name and then filtering by date).

Why is this so slow, and how can I make it fast? And what is the Pythonic approach? Possibilities I’ve been considering:

sqlite3 is too slow and I need something more heavyweight.
I need to change my schema or my queries to be more optimized.
I need a new tool of some kind.
I read somewhere that in sqlite 3, doing repeated calls to cursor.execute is much slower than using cursor.executemany. It turns out executemany isn’t compatible with select statements though, so I think this was a red herring.

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-05-31T12:38:42+00:00

Editorial Team

2026-05-31T12:38:42+00:00Added an answer on May 31, 2026 at 12:38 pm

sqlite3 is too slow, and I need something more heavyweight

First, sqlite3 is fast, sometime faster than MySQL

Second, you have to use index, put a compound index in (date1, date2, name) will speed thing up significantly

0

Reply
Share
Share

- Report

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

I have a spreadsheet with about 1.7m lines, totalling 1 GB, and need to

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply