Sign Up

Sign Up to our social questions and Answers Engine to ask questions, answer people’s questions, and connect with other people.

Have an account? Sign In

Have an account? Sign In Now

Sign In

Login to our social questions & Answers Engine to ask questions answer people’s questions & connect with other people.

Sign Up Here

Forgot Password?

Don't have account, Sign Up Here

Forgot Password

Lost your password? Please enter your email address. You will receive a link and will create a new password via email.

Have an account? Sign In Now

You must login to ask a question.

Forgot Password?

Need An Account, Sign Up Here

Please briefly explain why you feel this question should be reported.

Please briefly explain why you feel this answer should be reported.

Please briefly explain why you feel this user should be reported.

Sign InSign Up

The Archive Base

The Archive Base Logo The Archive Base Logo

The Archive Base Navigation

  • Home
  • SEARCH
  • About Us
  • Blog
  • Contact Us
Search
Ask A Question

Mobile menu

Close
Ask a Question
  • Home
  • Add group
  • Groups page
  • Feed
  • User Profile
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Buy Points
  • Users
  • Help
  • Buy Theme
  • SEARCH
Home/ Questions/Q 7716699
In Process

The Archive Base Latest Questions

Editorial Team
  • 0
Editorial Team
Asked: June 1, 20262026-06-01T02:41:55+00:00 2026-06-01T02:41:55+00:00

I have a huge database with rows structured by the fields date, ad, site,

  • 0

I have a huge database with rows structured by the fields “date, ad, site, impressions, clicks”

I got all of them via python using:

cursor.execute(select * from dabase)
data = cursor.fetchall()

From all this data, I need to sample only the rows that happened in a certain time an ad when printed in certain site has lead to an amount of clicks bigger than zero, so for instance:

row(1) : (t1, ad1, site1) -> clicks = 1 (t is time)

row(2) : (t2, ad1, site1) -> clicks = 0

So the ad1 and site1 at point t1 had clicks > 0 and therefore all points in data containing ad1 and site1 must be taken and put into another list, which I called final_list that would contain row(1) and row(2) (row(2) has 0 clicks, but since in time t1 ad1 and site1 had clicks > 0, so this row must be taken as well)

When I tried making it via MySQL Workbench it took so long that I got the error message “Lost Connection to Database”. I think it happens because the table has almost 40 million rows, even though I´ve seem people working with much bigger amounts of data here MySQL is not being able to handle it, that´s why I used python (in fact, to get the rows with clicks > 0 it took a few seconds in python whereas it took more than 10 minutes via MySQL, I´m not sure precisely how long it was)

What I did then was to first select points ad and site with clicks > 0:

points = [(row[1], row[2]) for row in data if row[4]]
points = list(set(points))
dic = {}
for element in points:
    dic[element] = 1

This code took just a few seconds to run. Having a dictionary with the wanted points I began to insert data into the final_list:

final_list = []
for row in data:
    try:
        if dic[(row[1], row[2])] == 1: final_list.append(row)
    except: continue

But it´s taking too long and I´ve been trying to figure out a way to make it go faster. Is it possible?

I appreciate any help!

  • 1 1 Answer
  • 0 Views
  • 0 Followers
  • 0
Share
  • Facebook
  • Report

Leave an answer
Cancel reply

You must login to add an answer.

Forgot Password?

Need An Account, Sign Up Here

1 Answer

  • Voted
  • Oldest
  • Recent
  • Random
  1. Editorial Team
    Editorial Team
    2026-06-01T02:41:56+00:00Added an answer on June 1, 2026 at 2:41 am

    I know the comments have asked why you aren’t able to just do this in the database, which I wonder as well… but as for at least addressing your code, you probably don’t need a bunch of steps in the middle such as converting to list -> set -> list -> dictionary. I’m sure the list append()’s are killing you, as well as the for loops.

    What about this?

    points = set((row[1], row[2]) for row in data if row[4])
    final_list = [d for d in data if (d[1], d[2]) in points]
    

    You could even see if this is faster to get your point set:

    from operator import itemgetter
    from itertools import ifilter
    
    points = set(ifilter(itemgetter(4), data))
    getter = itemgetter(1,2)
    final_list = [d for d in data if getter(d) in points]
    

    My answer gives your question the benefit of the doubt that you have no option for doing this regularily from sql with a better sql query

    • 0
    • Reply
    • Share
      Share
      • Share on Facebook
      • Share on Twitter
      • Share on LinkedIn
      • Share on WhatsApp
      • Report

Sidebar

Related Questions

I have a huge database (800MB) which consists of a field called 'Date Last
I have just imported a huge MySQL database. Most fields are latin1_swedish_ci, and they
I got a huge PostgreSQL database with lots of tables. I want learn all
We have a large POSTGRESQL transactional database (around 70 million rows in all), and
I have a huge dataset (around 5 000 000 rows in a database) which
I have a huge database with 100's of tables and stored procedures. Using SQL
I have a huge database with some 100 tables and some 250 stored procedures.
I have a huge database which holds pairs of numbers (A,B), each ranging from
I have a fairly huge database with a master table with a single column
I have a database containing a single huge table. At the moment a query

Explore

  • Home
  • Add group
  • Groups page
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Users
  • Help
  • SEARCH

Footer

© 2021 The Archive Base. All Rights Reserved
With Love by The Archive Base

Insert/edit link

Enter the destination URL

Or link to existing content

    No search term specified. Showing recent items. Search or use up and down arrow keys to select an item.