Sign Up

Sign Up to our social questions and Answers Engine to ask questions, answer people’s questions, and connect with other people.

Have an account? Sign In

Have an account? Sign In Now

Sign In

Login to our social questions & Answers Engine to ask questions answer people’s questions & connect with other people.

Sign Up Here

Forgot Password?

Don't have account, Sign Up Here

Forgot Password

Lost your password? Please enter your email address. You will receive a link and will create a new password via email.

Have an account? Sign In Now

You must login to ask a question.

Forgot Password?

Need An Account, Sign Up Here

Please briefly explain why you feel this question should be reported.

Please briefly explain why you feel this answer should be reported.

Please briefly explain why you feel this user should be reported.

Sign InSign Up

The Archive Base

The Archive Base Logo The Archive Base Logo

The Archive Base Navigation

  • Home
  • SEARCH
  • About Us
  • Blog
  • Contact Us
Search
Ask A Question

Mobile menu

Close
Ask a Question
  • Home
  • Add group
  • Groups page
  • Feed
  • User Profile
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Buy Points
  • Users
  • Help
  • Buy Theme
  • SEARCH
Home/ Questions/Q 6952959
In Process

The Archive Base Latest Questions

Editorial Team
  • 0
Editorial Team
Asked: May 27, 20262026-05-27T14:25:00+00:00 2026-05-27T14:25:00+00:00

I need to match two very large Numpy arrays (one is 20000 rows, another

  • 0

I need to match two very large Numpy arrays (one is 20000 rows, another about 100000 rows) and I am trying to build a script to do it efficiently. Simple looping over the arrays is incredibly slow, can someone suggest a better way? Here is what I am trying to do: array datesSecondDict and array pwfs2Dates contain datetime values, I need to take each datetime value from array pwfs2Dates (smaller array) and see if there is a datetime value like that (plus minus 5 minutes) in array datesSecondDict (there might be more than 1). If there is one (or more) I populate a new array (of the same size as array pwfs2Dates) with the value (one of the values) from array valsSecondDict (which is just the array with the corresponding numerical values to datesSecondDict). Here is a solution by @unutbu and @joaquin that worked for me (thanks guys!):

import time
import datetime as dt
import numpy as np

def combineArs(dict1, dict2):
   """Combine data from 2 dictionaries into a list.
   dict1 contains primary data (e.g. seeing parameter).
   The function compares each timestamp in dict1 to dict2
   to see if there is a matching timestamp record(s)
   in dict2 (plus/minus 5 minutes).
   ==If yes: a list called data gets appended with the
   corresponding parameter value from dict2.
   (Note that if there are more than 1 record matching,
   the first occuring value gets appended to the list).
   ==If no: a list called data gets appended with 0."""
   # Specify the keys to use    
   pwfs2Key = 'pwfs2:dc:seeing'
   dimmKey = 'ws:seeFwhm'

   # Create an iterator for primary dict 
   datesPrimDictIter = iter(dict1[pwfs2Key]['datetimes'])

   # Take the first timestamp value in primary dict
   nextDatePrimDict = next(datesPrimDictIter)

   # Split the second dictionary into lists
   datesSecondDict = dict2[dimmKey]['datetime']
   valsSecondDict  = dict2[dimmKey]['values']

   # Define time window
   fiveMins = dt.timedelta(minutes = 5)
   data = []
   #st = time.time()
   for i, nextDateSecondDict in enumerate(datesSecondDict):
       try:
           while nextDatePrimDict < nextDateSecondDict - fiveMins:
               # If there is no match: append zero and move on
               data.append(0)
               nextDatePrimDict = next(datesPrimDictIter)
           while nextDatePrimDict < nextDateSecondDict + fiveMins:
               # If there is a match: append the value of second dict
               data.append(valsSecondDict[i])
               nextDatePrimDict = next(datesPrimDictIter)
       except StopIteration:
           break
   data = np.array(data)   
   #st = time.time() - st    
   return data

Thanks,
Aina.

  • 1 1 Answer
  • 0 Views
  • 0 Followers
  • 0
Share
  • Facebook
  • Report

Leave an answer
Cancel reply

You must login to add an answer.

Forgot Password?

Need An Account, Sign Up Here

1 Answer

  • Voted
  • Oldest
  • Recent
  • Random
  1. Editorial Team
    Editorial Team
    2026-05-27T14:25:00+00:00Added an answer on May 27, 2026 at 2:25 pm

    Are the array dates sorted ?

    • If yes, you can speed up your comparisons by breaking from the inner
      loop comparison once its dates are bigger than the date given by the
      outer loop. In this way you will made a one-pass comparison instead of
      looping dimVals items len(pwfs2Vals) times
    • If no, maybe you should transform the current pwfs2Dates array to, for example,
      an array of pairs [(date, array_index),...] and then you can sort by
      date all your arrays to make the one-pass comparison indicated above and at the
      same time to be able to get the original indexes needed to set data[i]

    for example if the arrays were already sorted (I use lists here, not sure you need arrays for that):
    (Edited: now using and iterator not to loop pwfs2Dates from the beginning on each step):

    pdates = iter(enumerate(pwfs2Dates))
    i, datei = pdates.next() 
    
    for datej, valuej in zip(dimmDates, dimvals):
        while datei < datej - fiveMinutes:
            i, datei = pdates.next()
        while datei < datej + fiveMinutes:
            data[i] = valuej
            i, datei = pdates.next()
    

    Otherwise, if they were not ordered and you created the sorted, indexed lists like this:

    pwfs2Dates = sorted([(date, idx) for idx, date in enumerate(pwfs2Dates)])
    dimmDates = sorted([(date, idx) for idx, date in enumerate(dimmDates)])
    

    the code would be:
    (Edited: now using and iterator not to loop pwfs2Dates from the beginning on each step):

    pdates = iter(pwfs2Dates)
    datei, i = pdates.next()
    
    for datej, j in dimmDates:
        while datei < datej - fiveMinutes:
            datei, i = pdates.next()
        while datei < datej + fiveMinutes:
            data[i] = dimVals[j]
            datei, i = pdates.next()
    

    great!

    ..

    1. Note that dimVals:

      dimVals  = np.array(dict1[dimmKey]['values'])
      

      is not used in your code and can be eliminated.

    2. Note that your code gets greatly simplified by looping through the
      array itself instead of using xrange

    Edit: The answer from unutbu address some weak parts in the code above.
    I indicate them here for completness:

    1. Use of next: next(iterator) is prefered to iterator.next().
      iterator.next() is an exception to a conventional naming rule that
      has been fixed in py3k renaming this method as
      iterator.__next__().
    2. Check for the end of the iterator with a try/except. After all the
      items in the iterator are finished the next call to next()
      produces an StopIteration Exception. Use try/except to kindly
      break out of the loop when that happens. For the specific case of the
      OP question this is not an issue, because the two arrrays are the same
      size so the for loop finishes at the same time than the iterator. So no
      exception is risen. However, there could be cases were dict1 and dict2
      are not the same size. And in this case there is the posibility of an
      exception being risen.
      Question is: what is better, to use try/except or to prepare the arrays
      before looping by equalizing them to the shorter one.
    • 0
    • Reply
    • Share
      Share
      • Share on Facebook
      • Share on Twitter
      • Share on LinkedIn
      • Share on WhatsApp
      • Report

Sidebar

Related Questions

I'm trying to compare two arrays both 2d, I need it only match when
Not sure how to go about this... But, I have two arrays, one with
I need to match up two almost-the-same long freetext strings; i.e., to find index-to-index
I need to match input strings (URLs) against a large set (anywhere from 1k-250k)
I need to have two separate pages on the site I'm planning to build
I need to match two specific words with 30 (or less) characters in between.
I have a text file that contains very long lines. I need one piece
I need to match and remove all tags using a regular expression in Perl.
I need to match something in the form <a href=pic/5 id=piclink><img src=thumb/5 /></a> to
I need to match (case insensitive) abcd and an optional trademark symbol Regex: /abcd(™)?/gi

Explore

  • Home
  • Add group
  • Groups page
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Users
  • Help
  • SEARCH

Footer

© 2021 The Archive Base. All Rights Reserved
With Love by The Archive Base

Insert/edit link

Enter the destination URL

Or link to existing content

    No search term specified. Showing recent items. Search or use up and down arrow keys to select an item.