Sign Up

Sign Up to our social questions and Answers Engine to ask questions, answer people’s questions, and connect with other people.

Have an account? Sign In

Have an account? Sign In Now

Sign In

Login to our social questions & Answers Engine to ask questions answer people’s questions & connect with other people.

Sign Up Here

Forgot Password?

Don't have account, Sign Up Here

Forgot Password

Lost your password? Please enter your email address. You will receive a link and will create a new password via email.

Have an account? Sign In Now

You must login to ask a question.

Forgot Password?

Need An Account, Sign Up Here

Please briefly explain why you feel this question should be reported.

Please briefly explain why you feel this answer should be reported.

Please briefly explain why you feel this user should be reported.

Sign InSign Up

The Archive Base

The Archive Base Logo The Archive Base Logo

The Archive Base Navigation

  • SEARCH
  • Home
  • About Us
  • Blog
  • Contact Us
Search
Ask A Question

Mobile menu

Close
Ask a Question
  • Home
  • Add group
  • Groups page
  • Feed
  • User Profile
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Buy Points
  • Users
  • Help
  • Buy Theme
  • SEARCH
Home/ Questions/Q 7622559
In Process

The Archive Base Latest Questions

Editorial Team
  • 0
Editorial Team
Asked: May 31, 20262026-05-31T04:29:02+00:00 2026-05-31T04:29:02+00:00

This is my first time trying Pandas. I think I have a reasonable use

  • 0

This is my first time trying Pandas. I think I have a reasonable use case, but I am stumbling. I want to load a tab delimited file into a Pandas Dataframe, then group it by Symbol and plot it with the x.axis indexed by the TimeStamp column. Here is a subset of the data:

Symbol,Price,M1,M2,Volume,TimeStamp
TBET,2.19,3,8.05,1124179,9:59:14 AM
FUEL,3.949,9,1.15,109674,9:59:11 AM
SUNH,4.37,6,0.09,24394,9:59:09 AM
FUEL,3.9099,8,1.11,105265,9:59:09 AM
TBET,2.18,2,8.03,1121629,9:59:05 AM
ORBC,3.4,2,0.22,10509,9:59:02 AM
FUEL,3.8599,7,1.07,102116,9:58:47 AM
FUEL,3.8544,6,1.05,100116,9:58:40 AM
GBR,3.83,4,0.46,64251,9:58:24 AM
GBR,3.8,3,0.45,63211,9:58:20 AM
XRA,3.6167,3,0.12,42310,9:58:08 AM
GBR,3.75,2,0.34,47521,9:57:52 AM
MPET,1.42,3,0.26,44600,9:57:52 AM

Note two things about the TimeStamp column;

  1. it has duplicate values and
  2. the intervals are irregular.

I thought I could do something like this…

from pandas import *
import pylab as plt

df = read_csv('data.txt',index_col=5)
df.sort(ascending=False)

df.plot()
plt.show()

But the read_csv method raises an exception “Tried columns 1-X as index but found duplicates”. Is there an option that will allow me to specify an index column with duplicate values?

I would also be interested in aligning my irregular timestamp intervals to one second resolution, I would still wish to plot multiple events for a given second, but maybe I could introduce a unique index, then align my prices to it?

  • 1 1 Answer
  • 0 Views
  • 0 Followers
  • 0
Share
  • Facebook
  • Report

Leave an answer
Cancel reply

You must login to add an answer.

Forgot Password?

Need An Account, Sign Up Here

1 Answer

  • Voted
  • Oldest
  • Recent
  • Random
  1. Editorial Team
    Editorial Team
    2026-05-31T04:29:03+00:00Added an answer on May 31, 2026 at 4:29 am

    I created several issues just now to address some features / conveniences that I think would be nice to have: GH-856, GH-857, GH-858

    We’re currently working on a revamp of the time series capabilities and doing alignment to secondly resolution is possible now (though not with duplicates, so would need to write some functions for that). I also want to support duplicate timestamps in a better way. However, this is really panel (3D) data, so one way that you might alter things is the following:

    In [29]: df.pivot('Symbol', 'TimeStamp').stack()
    Out[29]: 
                       M1    M2   Price   Volume
    Symbol TimeStamp                            
    FUEL   9:58:40 AM   6  1.05  3.8544   100116
           9:58:47 AM   7  1.07  3.8599   102116
           9:59:09 AM   8  1.11  3.9099   105265
           9:59:11 AM   9  1.15  3.9490   109674
    GBR    9:57:52 AM   2  0.34  3.7500    47521
           9:58:20 AM   3  0.45  3.8000    63211
           9:58:24 AM   4  0.46  3.8300    64251
    MPET   9:57:52 AM   3  0.26  1.4200    44600
    ORBC   9:59:02 AM   2  0.22  3.4000    10509
    SUNH   9:59:09 AM   6  0.09  4.3700    24394
    TBET   9:59:05 AM   2  8.03  2.1800  1121629
           9:59:14 AM   3  8.05  2.1900  1124179
    XRA    9:58:08 AM   3  0.12  3.6167    42310
    

    note that this created a MultiIndex. Another way I could have gotten this:

    In [32]: df.set_index(['Symbol', 'TimeStamp'])
    Out[32]: 
                        Price  M1    M2   Volume
    Symbol TimeStamp                            
    TBET   9:59:14 AM  2.1900   3  8.05  1124179
    FUEL   9:59:11 AM  3.9490   9  1.15   109674
    SUNH   9:59:09 AM  4.3700   6  0.09    24394
    FUEL   9:59:09 AM  3.9099   8  1.11   105265
    TBET   9:59:05 AM  2.1800   2  8.03  1121629
    ORBC   9:59:02 AM  3.4000   2  0.22    10509
    FUEL   9:58:47 AM  3.8599   7  1.07   102116
           9:58:40 AM  3.8544   6  1.05   100116
    GBR    9:58:24 AM  3.8300   4  0.46    64251
           9:58:20 AM  3.8000   3  0.45    63211
    XRA    9:58:08 AM  3.6167   3  0.12    42310
    GBR    9:57:52 AM  3.7500   2  0.34    47521
    MPET   9:57:52 AM  1.4200   3  0.26    44600
    
    In [33]: df.set_index(['Symbol', 'TimeStamp']).sortlevel(0)
    Out[33]: 
                        Price  M1    M2   Volume
    Symbol TimeStamp                            
    FUEL   9:58:40 AM  3.8544   6  1.05   100116
           9:58:47 AM  3.8599   7  1.07   102116
           9:59:09 AM  3.9099   8  1.11   105265
           9:59:11 AM  3.9490   9  1.15   109674
    GBR    9:57:52 AM  3.7500   2  0.34    47521
           9:58:20 AM  3.8000   3  0.45    63211
           9:58:24 AM  3.8300   4  0.46    64251
    MPET   9:57:52 AM  1.4200   3  0.26    44600
    ORBC   9:59:02 AM  3.4000   2  0.22    10509
    SUNH   9:59:09 AM  4.3700   6  0.09    24394
    TBET   9:59:05 AM  2.1800   2  8.03  1121629
           9:59:14 AM  2.1900   3  8.05  1124179
    XRA    9:58:08 AM  3.6167   3  0.12    42310
    

    you can get this data in a true panel format like so:

    In [35]: df.set_index(['TimeStamp', 'Symbol']).sortlevel(0).to_panel()
    Out[35]: 
    <class 'pandas.core.panel.Panel'>
    Dimensions: 4 (items) x 11 (major) x 7 (minor)
    Items: Price to Volume
    Major axis: 9:57:52 AM to 9:59:14 AM
    Minor axis: FUEL to XRA
    
    In [36]: panel = df.set_index(['TimeStamp', 'Symbol']).sortlevel(0).to_panel()
    
    In [37]: panel['Price']
    Out[37]: 
    Symbol        FUEL   GBR  MPET  ORBC  SUNH  TBET     XRA
    TimeStamp                                               
    9:57:52 AM     NaN  3.75  1.42   NaN   NaN   NaN     NaN
    9:58:08 AM     NaN   NaN   NaN   NaN   NaN   NaN  3.6167
    9:58:20 AM     NaN  3.80   NaN   NaN   NaN   NaN     NaN
    9:58:24 AM     NaN  3.83   NaN   NaN   NaN   NaN     NaN
    9:58:40 AM  3.8544   NaN   NaN   NaN   NaN   NaN     NaN
    9:58:47 AM  3.8599   NaN   NaN   NaN   NaN   NaN     NaN
    9:59:02 AM     NaN   NaN   NaN   3.4   NaN   NaN     NaN
    9:59:05 AM     NaN   NaN   NaN   NaN   NaN  2.18     NaN
    9:59:09 AM  3.9099   NaN   NaN   NaN  4.37   NaN     NaN
    9:59:11 AM  3.9490   NaN   NaN   NaN   NaN   NaN     NaN
    9:59:14 AM     NaN   NaN   NaN   NaN   NaN  2.19     NaN
    

    you can then generate some plots from that data.

    note here that the timestamps are still as strings– I guess they could be converted to Python datetime.time objects and things might be a bit easier to work with. I don’t have many plans to provide a lot of support for raw times vs. timestamps (date + time) but if enough people need it I suppose I can be convinced 🙂

    If you have multiple observations on a second for a single symbol then some of the above methods will not work. But I want to build in better support for that in upcoming releases of pandas, so knowing your use cases will be helpful to me– consider joining the mailing list (pystatsmodels)

    • 0
    • Reply
    • Share
      Share
      • Share on Facebook
      • Share on Twitter
      • Share on LinkedIn
      • Share on WhatsApp
      • Report

Sidebar

Related Questions

this is my first time trying to use document classes in AS3 and im
This is my first time trying to use the XMLRPC::Client library to interact with
This is my first time trying to use any SCM so please bear with
This is my first time trying to use both ARC and Core Data. I
This is my first time trying to use Web Start, and I am trying
This is my first time trying to use a custom view in XML and
This is my first time trying to use a database in ASP.Net, and I
first time trying to deal with this call_user_func_array, but something isn't working, since I
this is my first time trying to use github so please bear with me..
This is the first time I'm trying random numbers with C (I miss C#).

Explore

  • Home
  • Add group
  • Groups page
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Users
  • Help
  • SEARCH

Footer

© 2021 The Archive Base. All Rights Reserved
With Love by The Archive Base

Insert/edit link

Enter the destination URL

Or link to existing content

    No search term specified. Showing recent items. Search or use up and down arrow keys to select an item.