Sign Up

Sign Up to our social questions and Answers Engine to ask questions, answer people’s questions, and connect with other people.

Have an account? Sign In

Have an account? Sign In Now

Sign In

Login to our social questions & Answers Engine to ask questions answer people’s questions & connect with other people.

Sign Up Here

Forgot Password?

Don't have account, Sign Up Here

Forgot Password

Lost your password? Please enter your email address. You will receive a link and will create a new password via email.

Have an account? Sign In Now

You must login to ask a question.

Forgot Password?

Need An Account, Sign Up Here

Please briefly explain why you feel this question should be reported.

Please briefly explain why you feel this answer should be reported.

Please briefly explain why you feel this user should be reported.

Sign InSign Up

The Archive Base

The Archive Base Logo The Archive Base Logo

The Archive Base Navigation

  • SEARCH
  • Home
  • About Us
  • Blog
  • Contact Us
Search
Ask A Question

Mobile menu

Close
Ask a Question
  • Home
  • Add group
  • Groups page
  • Feed
  • User Profile
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Buy Points
  • Users
  • Help
  • Buy Theme
  • SEARCH
Home/ Questions/Q 6067715
In Process

The Archive Base Latest Questions

Editorial Team
  • 0
Editorial Team
Asked: May 23, 20262026-05-23T09:38:18+00:00 2026-05-23T09:38:18+00:00

I have two sets of temperature date, which have readings at regular (but different)

  • 0

I have two sets of temperature date, which have readings at regular (but different) time intervals. I’m trying to get the correlation between these two sets of data.

I’ve been playing with Pandas to try to do this. I’ve created two timeseries, and am using TimeSeriesA.corr(TimeSeriesB). However, if the times in the two timeSeries do not match up exactly (they’re generally off by seconds), I get Null as an answer. I could get a decent answer if I could:

a) interpolate/fill missing times in each TimeSeries (I know this is possible in Pandas, I just don’t know how to do it)

b) strip the seconds out of python datetime objects (Set seconds to 00, without changing minutes). I’d lose a degree of accuracy, but not a huge amount

c) use something else in Pandas to get the correlation between two timeSeries

d) use something in python to get the correlation between two lists of floats, each float having a corresponding datetime object, taking into account the time.

Anyone have any suggestions?

  • 1 1 Answer
  • 0 Views
  • 0 Followers
  • 0
Share
  • Facebook
  • Report

Leave an answer
Cancel reply

You must login to add an answer.

Forgot Password?

Need An Account, Sign Up Here

1 Answer

  • Voted
  • Oldest
  • Recent
  • Random
  1. Editorial Team
    Editorial Team
    2026-05-23T09:38:19+00:00Added an answer on May 23, 2026 at 9:38 am

    You have a number of options using pandas, but you have to make a decision about how it makes sense to align the data given that they don’t occur at the same instants.

    Use the values “as of” the times in one of the time series, here’s an example:

        In [15]: ts
        Out[15]: 
        2000-01-03 00:00:00    -0.722808451504
        2000-01-04 00:00:00    0.0125041039477
        2000-01-05 00:00:00    0.777515530539
        2000-01-06 00:00:00    -0.35714026263
        2000-01-07 00:00:00    -1.55213541118
        2000-01-10 00:00:00    -0.508166334892
        2000-01-11 00:00:00    0.58016097981
        2000-01-12 00:00:00    1.50766289013
        2000-01-13 00:00:00    -1.11114968643
        2000-01-14 00:00:00    0.259320239297
    
    
    
        In [16]: ts2
        Out[16]: 
        2000-01-03 00:00:30    1.05595278907
        2000-01-04 00:00:30    -0.568961755792
        2000-01-05 00:00:30    0.660511172645
        2000-01-06 00:00:30    -0.0327384421979
        2000-01-07 00:00:30    0.158094407533
        2000-01-10 00:00:30    -0.321679671377
        2000-01-11 00:00:30    0.977286027619
        2000-01-12 00:00:30    -0.603541295894
        2000-01-13 00:00:30    1.15993249209
        2000-01-14 00:00:30    -0.229379534767
    

    you can see these are off by 30 seconds. The reindex function enables you to align data while filling forward values (getting the “as of” value):

        In [17]: ts.reindex(ts2.index, method='pad')
        Out[17]: 
        2000-01-03 00:00:30    -0.722808451504
        2000-01-04 00:00:30    0.0125041039477
        2000-01-05 00:00:30    0.777515530539
        2000-01-06 00:00:30    -0.35714026263
        2000-01-07 00:00:30    -1.55213541118
        2000-01-10 00:00:30    -0.508166334892
        2000-01-11 00:00:30    0.58016097981
        2000-01-12 00:00:30    1.50766289013
        2000-01-13 00:00:30    -1.11114968643
        2000-01-14 00:00:30    0.259320239297
    
        In [18]: ts2.corr(ts.reindex(ts2.index, method='pad'))
        Out[18]: -0.31004148593302283
    

    note that ‘pad’ is also aliased by ‘ffill’ (but only in the very latest version of pandas on GitHub as of this time!).

    Strip seconds out of all your datetimes. The best way to do this is to use rename

        In [25]: ts2.rename(lambda date: date.replace(second=0))
        Out[25]: 
        2000-01-03 00:00:00    1.05595278907
        2000-01-04 00:00:00    -0.568961755792
        2000-01-05 00:00:00    0.660511172645
        2000-01-06 00:00:00    -0.0327384421979
        2000-01-07 00:00:00    0.158094407533
        2000-01-10 00:00:00    -0.321679671377
        2000-01-11 00:00:00    0.977286027619
        2000-01-12 00:00:00    -0.603541295894
        2000-01-13 00:00:00    1.15993249209
        2000-01-14 00:00:00    -0.229379534767
    

    Note that if rename causes there to be duplicate dates an Exception will be thrown.

    For something a little more advanced, suppose you wanted to correlate the mean value for each minute (where you have multiple observations per second):

        In [31]: ts_mean = ts.groupby(lambda date: date.replace(second=0)).mean()
    
        In [32]: ts2_mean = ts2.groupby(lambda date: date.replace(second=0)).mean()
    
        In [33]: ts_mean.corr(ts2_mean)
        Out[33]: -0.31004148593302283
    

    These last code snippets may not work if you don’t have the latest code from https://github.com/wesm/pandas. If .mean() doesn’t work on a GroupBy object per above try .agg(np.mean)

    Hope this helps!

    • 0
    • Reply
    • Share
      Share
      • Share on Facebook
      • Share on Twitter
      • Share on LinkedIn
      • Share on WhatsApp
      • Report

Sidebar

Related Questions

I have two sets of data which I need to join, but there is
I have two sets of drop downs for start date and end date. Each
I have two sets of textboxes which are generated dynamically. All these textboxes has
I'm trying to do the following: I have two sets of DOM elements on
I have two vectors of matching lengths. They are readings from two different sensors
In other languages I have two sets of operators, or and || , which
Which one of these is faster? Is one better? Basically I'll have two sets
I have two sets of information. One is a date that is in Tuesday,
I have two sets of objets and I want to get the intersection of
I have two sets of elements with (sometimes) corresponding rel and id attributes: <a

Explore

  • Home
  • Add group
  • Groups page
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Users
  • Help
  • SEARCH

Footer

© 2021 The Archive Base. All Rights Reserved
With Love by The Archive Base

Insert/edit link

Enter the destination URL

Or link to existing content

    No search term specified. Showing recent items. Search or use up and down arrow keys to select an item.