Sign Up

Sign Up to our social questions and Answers Engine to ask questions, answer people’s questions, and connect with other people.

Have an account? Sign In

Have an account? Sign In Now

Sign In

Login to our social questions & Answers Engine to ask questions answer people’s questions & connect with other people.

Sign Up Here

Forgot Password?

Don't have account, Sign Up Here

Forgot Password

Lost your password? Please enter your email address. You will receive a link and will create a new password via email.

Have an account? Sign In Now

You must login to ask a question.

Forgot Password?

Need An Account, Sign Up Here

Please briefly explain why you feel this question should be reported.

Please briefly explain why you feel this answer should be reported.

Please briefly explain why you feel this user should be reported.

Sign InSign Up

The Archive Base

The Archive Base Logo The Archive Base Logo

The Archive Base Navigation

  • SEARCH
  • Home
  • About Us
  • Blog
  • Contact Us
Search
Ask A Question

Mobile menu

Close
Ask a Question
  • Home
  • Add group
  • Groups page
  • Feed
  • User Profile
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Buy Points
  • Users
  • Help
  • Buy Theme
  • SEARCH
Home/ Questions/Q 7655941
In Process

The Archive Base Latest Questions

Editorial Team
  • 0
Editorial Team
Asked: May 31, 20262026-05-31T12:38:40+00:00 2026-05-31T12:38:40+00:00

I have a spreadsheet with about 1.7m lines, totalling 1 GB, and need to

  • 0

I have a spreadsheet with about 1.7m lines, totalling 1 GB, and need to perform queries on it.

Being most comfortable with Python, my first approach was to hack together a bunch of dictionaries keyed in a way that would facilitate the queries I was trying to make. E.g. if I needed to access everyone with a particular area code and age I would make an areacode_age 2-dimensional dictionary. I ended up needing quite a few which multiplied memory footprint to ~10GB, and even though I had enough RAM the process was slow.

I imported sqlite3 and imported my data into an in-memory database. Turns out doing a query like SELECT (a, b, c) FROM foo WHERE date1<=d AND date2>e AND name=f takes 0.05 seconds. Doing this for my 1.7m rows would take 24 hours. My hacky approach with dictionaries was about 3 orders of magnitude faster for this particular task (and, in this example, I couldn’t key on date1 and date2, so I was getting every row that matched name and then filtering by date).

Why is this so slow, and how can I make it fast? And what is the Pythonic approach? Possibilities I’ve been considering:

  • sqlite3 is too slow and I need something more heavyweight.
  • I need to change my schema or my queries to be more optimized.
  • I need a new tool of some kind.
  • I read somewhere that in sqlite 3, doing repeated calls to cursor.execute is much slower than using cursor.executemany. It turns out executemany isn’t compatible with select statements though, so I think this was a red herring.
  • 1 1 Answer
  • 0 Views
  • 0 Followers
  • 0
Share
  • Facebook
  • Report

Leave an answer
Cancel reply

You must login to add an answer.

Forgot Password?

Need An Account, Sign Up Here

1 Answer

  • Voted
  • Oldest
  • Recent
  • Random
  1. Editorial Team
    Editorial Team
    2026-05-31T12:38:42+00:00Added an answer on May 31, 2026 at 12:38 pm

    sqlite3 is too slow, and I need something more heavyweight

    First, sqlite3 is fast, sometime faster than MySQL

    Second, you have to use index, put a compound index in (date1, date2, name) will speed thing up significantly

    • 0
    • Reply
    • Share
      Share
      • Share on Facebook
      • Share on Twitter
      • Share on LinkedIn
      • Share on WhatsApp
      • Report

Sidebar

Related Questions

I have a huge spreadsheet of about 300001 rows and 20 columns. The first
I have an excel spreadsheet with about 700 entries. The names are listed as
I have a user that created about 50 sheets inside a Google Spreadsheet. Each
i have an excel spreadsheet of about 3 million cells. i asked the following
i have an excel spreadsheet that is about 300,000 rows and about 100 columns
i have about 300,000 records in this spreadsheet. and there are a couple hundred
I have a spreadsheet of addresses and I need to calculate the distance between
I have a spreadsheet that has 2 columns, the first is a unique identifier,
I have an Excel spreadsheet with 1 column, 700 rows. I care about every
I have an excel spreadsheet that has about 18k rows and three columns. I

Explore

  • Home
  • Add group
  • Groups page
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Users
  • Help
  • SEARCH

Footer

© 2021 The Archive Base. All Rights Reserved
With Love by The Archive Base

Insert/edit link

Enter the destination URL

Or link to existing content

    No search term specified. Showing recent items. Search or use up and down arrow keys to select an item.