Sign Up

Sign Up to our social questions and Answers Engine to ask questions, answer people’s questions, and connect with other people.

Have an account? Sign In

Have an account? Sign In Now

Sign In

Login to our social questions & Answers Engine to ask questions answer people’s questions & connect with other people.

Sign Up Here

Forgot Password?

Don't have account, Sign Up Here

Forgot Password

Lost your password? Please enter your email address. You will receive a link and will create a new password via email.

Have an account? Sign In Now

You must login to ask a question.

Forgot Password?

Need An Account, Sign Up Here

Please briefly explain why you feel this question should be reported.

Please briefly explain why you feel this answer should be reported.

Please briefly explain why you feel this user should be reported.

Sign InSign Up

The Archive Base

The Archive Base Logo The Archive Base Logo

The Archive Base Navigation

  • SEARCH
  • Home
  • About Us
  • Blog
  • Contact Us
Search
Ask A Question

Mobile menu

Close
Ask a Question
  • Home
  • Add group
  • Groups page
  • Feed
  • User Profile
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Buy Points
  • Users
  • Help
  • Buy Theme
  • SEARCH
Home/ Questions/Q 8039251
In Process

The Archive Base Latest Questions

Editorial Team
  • 0
Editorial Team
Asked: June 5, 20262026-06-05T03:30:50+00:00 2026-06-05T03:30:50+00:00

I have a huge python list(A) of lists. The length of list A is

  • 0

I have a huge python list(A) of lists. The length of list A is around 90,000. Each inner list contain around 700 tuples of (datetime.date,string). Now, I am analyzing this data. What I am doing is I am taking a window of size x in inner lists where- x = len(inner list) * (some fraction <= 1) and I am saving each ordered pair (a,b) where a occurs before b in that window (actually the innerlists are sorted wrt time). I am moving this window upto the last element adding one element at a time from one end and removing from other which takes O(window-size)time as I am considering the new tuples only. My code:

for i in xrange(window_size):
        j = i+1;
        while j<window_size:
            check_and_update(cur, my_list[i][1], my_list[j][1],log);
            j=j+1

    i=1;
    while i<=len(my_list)-window_size: 
        j=i;
        k=i+window_size-1;
        while j<k:
            check_and_update(cur, my_list[j][1], my_list[k][1],log);  
            j+=1
        i += 1  

Here cur is actually a sqlite3 database cursor,my_list is a list containing the tuples and I iterate this code for all the lists in A and log is a opened logfile. In method check_and_update() I am looking up my database to find the tuple if exists or else I insert it, along with its total number of occurrence so far. Code:

def check_and_update(cur,start,end,log):    
    t = str(start)+":"+ str(end)
    cur.execute("INSERT OR REPLACE INTO Extra (tuple,count)\
                 VALUES ( ? , coalesce((SELECT count +1 from Extra WHERE tuple = ?),1))",[t,t])

As expected this number of tuples is HUGE and I have previously experimented with dictionary which eats up the memory quite fast. So, I resorted to SQLite3, but now it is too slow. I have tried indexing but with no help. Probably the my program is spending way to much time querying and updating the database. Do you have any optimization ideas for this problem? Probably changing the algorithm or some different approach/tools. Thank you!

Edit: My goal here is to find the total number of tuples of strings that occur within the window grouped by the number of different innerlists they occur in. I extract this information with this query:

for i in range(1,size+1):       
        cur.execute('select * from Extra where count = ?',str(i))
        #other stuff

For Example ( I am ignoring the date entries and will write them as ‘dt’):

My_list = [
            [ ( dt,'user1') , (dt, 'user2'), (dt, 'user3') ]
            [ ( dt,'user3') , (dt, 'user4')]
            [ ( dt,'user2') , (dt, 'user3'), (dt,'user1') ]
          ]

here if I take fraction = 1 then, results:

only 1 occurrence in window: 5 (user 1-2,1-3,3-4,2-1,3-1)
only 2 occurrence in window: 2 (user 2-3)
  • 1 1 Answer
  • 0 Views
  • 0 Followers
  • 0
Share
  • Facebook
  • Report

Leave an answer
Cancel reply

You must login to add an answer.

Forgot Password?

Need An Account, Sign Up Here

1 Answer

  • Voted
  • Oldest
  • Recent
  • Random
  1. Editorial Team
    Editorial Team
    2026-06-05T03:30:51+00:00Added an answer on June 5, 2026 at 3:30 am

    Let me get this straight.

    You have up to about 22 billion potential tuples (for 90000 lists, any of 700, any of the following entries, on average 350) which might be less depending on the window size. You want to find, but number of inner lists that they appear in, how many tuples there are.

    Data of this size has to live on disk. The rule for data that lives on disk due to size is, “Never randomly access, instead generate and then sort.”

    So I’d suggest that you write out each tuple to a log file, one tuple per line. Sort that file. Now all instances of any given tuple are in one place. Then run through the file, and for each tuple emit the count of how many times it appears in (that is how many inner lists it is in). Sort that second file. Now run through that file, and you can extract how many tuples appeared 1x, 2x, 3x, etc.

    If you have multiple machines, it is easy to convert this into a MapReduce. (Which is morally the same approach, but you get to parallelize a lot of stuff.)

    • 0
    • Reply
    • Share
      Share
      • Share on Facebook
      • Share on Twitter
      • Share on LinkedIn
      • Share on WhatsApp
      • Report

Sidebar

Related Questions

I have a huge dataset (around 5 000 000 rows in a database) which
I have a huge list of numbers (each number is less or equal to
I have a huge list of datetime strings like the following ["Jun 1 2005
I have a huge checkbox list with different levels and I would like to
I have a huge java codebase (more than 10,000 java classes) that makes extensive
I have a huge text and a list of words ~10K. What is the
I have a list of approximately 300 words and a huge amount of text
I have huge list (200000) of strings (multi word). I want to group these
Suppose that I have a huge SQLite file (say, 500[MB]). Can 10 different python
In Python, I have an application that writes a list to a CSV file,

Explore

  • Home
  • Add group
  • Groups page
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Users
  • Help
  • SEARCH

Footer

© 2021 The Archive Base. All Rights Reserved
With Love by The Archive Base

Insert/edit link

Enter the destination URL

Or link to existing content

    No search term specified. Showing recent items. Search or use up and down arrow keys to select an item.