Sign Up

Sign Up to our social questions and Answers Engine to ask questions, answer people’s questions, and connect with other people.

Have an account? Sign In

Have an account? Sign In Now

Sign In

Login to our social questions & Answers Engine to ask questions answer people’s questions & connect with other people.

Sign Up Here

Forgot Password?

Don't have account, Sign Up Here

Forgot Password

Lost your password? Please enter your email address. You will receive a link and will create a new password via email.

Have an account? Sign In Now

You must login to ask a question.

Forgot Password?

Need An Account, Sign Up Here

Please briefly explain why you feel this question should be reported.

Please briefly explain why you feel this answer should be reported.

Please briefly explain why you feel this user should be reported.

Sign InSign Up

The Archive Base

The Archive Base Logo The Archive Base Logo

The Archive Base Navigation

  • SEARCH
  • Home
  • About Us
  • Blog
  • Contact Us
Search
Ask A Question

Mobile menu

Close
Ask a Question
  • Home
  • Add group
  • Groups page
  • Feed
  • User Profile
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Buy Points
  • Users
  • Help
  • Buy Theme
  • SEARCH
Home/ Questions/Q 7734063
In Process

The Archive Base Latest Questions

Editorial Team
  • 0
Editorial Team
Asked: June 1, 20262026-06-01T07:09:25+00:00 2026-06-01T07:09:25+00:00

Could someone please point me in the right direction with respect to OHLC data

  • 0

Could someone please point me in the right direction with respect to OHLC data timeframe conversion with Pandas? What I’m trying to do is build a Dataframe with data for higher timeframes, given data with lower timeframe.

For example, given I have the following one-minute (M1) data:

                       Open    High     Low   Close  Volume
Date                                                       
1999-01-04 10:22:00  1.1801  1.1819  1.1801  1.1817       4
1999-01-04 10:23:00  1.1817  1.1818  1.1804  1.1814      18
1999-01-04 10:24:00  1.1817  1.1817  1.1802  1.1806      12
1999-01-04 10:25:00  1.1807  1.1815  1.1795  1.1808      26
1999-01-04 10:26:00  1.1803  1.1806  1.1790  1.1806       4
1999-01-04 10:27:00  1.1801  1.1801  1.1779  1.1786      23
1999-01-04 10:28:00  1.1795  1.1801  1.1776  1.1788      28
1999-01-04 10:29:00  1.1793  1.1795  1.1782  1.1789      10
1999-01-04 10:31:00  1.1780  1.1792  1.1776  1.1792      12
1999-01-04 10:32:00  1.1788  1.1792  1.1788  1.1791       4

which has Open, High, Low, Close (OHLC) and volume values for every minute I would like to build a set of 5-minute readings (M5) which would look like so:

                       Open    High     Low   Close  Volume
Date                                                       
1999-01-04 10:25:00  1.1807  1.1815  1.1776  1.1789      91
1999-01-04 10:30:00  1.1780  1.1792  1.1776  1.1791      16

So the workflow is that:

  • Open is the Open of the first row in the timewindow
  • High is the highest High in the timewindow
  • Low is the lowest Low
  • Close is the last Close
  • Volume is simply a sum of Volumes

There are few issues though:

  • the data has gaps ( note there is no 10:30:00 row)
  • the 5-minute intervals have to start at round time, e.g. M5 starts at 10:25:00 not 10:22:00
  • first, incomplete set can be omitted like in this example, or included (so we could have 10:20:00 5-minute entry)

The Pandas documentation on up-down sampling gives an example, but they use mean value as the value of up-sampled row, which won’t work here. I have tried using groupby and agg but to no avail. For one getting highest High and lowest Low might be not so hard, but I have no idea how to get first Open and last Close.

What I tried is something along the lines of:

grouped = slice.groupby( dr5minute.asof ).agg( 
    { 'Low': lambda x : x.min()[ 'Low' ], 'High': lambda x : x.max()[ 'High' ] } 
)

but it results in following error, which I don’t understand:

In [27]: grouped = slice.groupby( dr5minute.asof ).agg( { 'Low' : lambda x : x.min()[ 'Low' ], 'High' : lambda x : x.max()[ 'High' ] } )
---------------------------------------------------------------------------
IndexError                                Traceback (most recent call last)
/work/python/fxcruncher/<ipython-input-27-df50f9522a2f> in <module>()
----> 1 grouped = slice.groupby( dr5minute.asof ).agg( { 'Low' : lambda x : x.min()[ 'Low' ], 'High' : lambda x : x.max()[ 'High' ] } )

/usr/lib/python2.7/site-packages/pandas/core/groupby.pyc in agg(self, func, *args, **kwargs)
    242         See docstring for aggregate
    243         """
--> 244         return self.aggregate(func, *args, **kwargs)
    245 
    246     def _iterate_slices(self):

/usr/lib/python2.7/site-packages/pandas/core/groupby.pyc in aggregate(self, arg, *args, **kwargs)
   1153                     colg = SeriesGroupBy(obj[col], column=col,
   1154                                          grouper=self.grouper)
-> 1155                     result[col] = colg.aggregate(func)
   1156 
   1157             result = DataFrame(result)

/usr/lib/python2.7/site-packages/pandas/core/groupby.pyc in aggregate(self, func_or_funcs, *args, **kwargs)
    906                 return self._python_agg_general(func_or_funcs, *args, **kwargs)
    907             except Exception:
--> 908                 result = self._aggregate_named(func_or_funcs, *args, **kwargs)
    909 
    910             index = Index(sorted(result), name=self.grouper.names[0])

/usr/lib/python2.7/site-packages/pandas/core/groupby.pyc in _aggregate_named(self, func, *args, **kwargs)
    976             grp = self.get_group(name)
    977             grp.name = name
--> 978             output = func(grp, *args, **kwargs)
    979             if isinstance(output, np.ndarray):
    980                 raise Exception('Must produce aggregated value')

/work/python/fxcruncher/<ipython-input-27-df50f9522a2f> in <lambda>(x)
----> 1 grouped = slice.groupby( dr5minute.asof ).agg( { 'Low' : lambda x : x.min()[ 'Low' ], 'High' : lambda x : x.max()[ 'High' ] } )

IndexError: invalid index to scalar variable.

So any help on doing that would be greatly appreciated. If the path I chose is not going to work, please suggest other relatively efficient approach (I have millions of rows). Some resources on using Pandas for financial processing would also be nice.

  • 1 1 Answer
  • 0 Views
  • 0 Followers
  • 0
Share
  • Facebook
  • Report

Leave an answer
Cancel reply

You must login to add an answer.

Forgot Password?

Need An Account, Sign Up Here

1 Answer

  • Voted
  • Oldest
  • Recent
  • Random
  1. Editorial Team
    Editorial Team
    2026-06-01T07:09:26+00:00Added an answer on June 1, 2026 at 7:09 am

    Your approach is sound, but fails because each function in the dict-of-functions applied to agg()
    receives a Series object reflecting the column matched by the key value. Therefore, it’s not necessary to
    filter on column label again. With this, and assuming groupby preserves order,
    you can slice the Series to extract the first/last element of the Open/Close
    columns (note: groupby documentation does not claim to preserve order of original data
    series, but seems to in practice.)

    In [50]: df.groupby(dr5minute.asof).agg({'Low': lambda s: s.min(), 
                                             'High': lambda s: s.max(),
                                             'Open': lambda s: s[0],
                                             'Close': lambda s: s[-1],
                                             'Volume': lambda s: s.sum()})
    Out[50]: 
                          Close    High     Low    Open  Volume
    key_0                                                      
    1999-01-04 10:20:00  1.1806  1.1819  1.1801  1.1801      34
    1999-01-04 10:25:00  1.1789  1.1815  1.1776  1.1807      91
    1999-01-04 10:30:00  1.1791  1.1792  1.1776  1.1780      16
    

    For reference, here is a table to summarize the expected
    input and output types of an aggregation function based on the groupby object type and how the aggregation function(s) is/are passed to agg().

                      agg() method     agg func    agg func          agg()
                      input type       accepts     returns           result
    GroupBy Object
    SeriesGroupBy     function         Series      value             Series
                      dict-of-funcs    Series      value             DataFrame, columns match dict keys
                      list-of-funcs    Series      value             DataFrame, columns match func names
    DataFrameGroupBy  function         DataFrame   Series/dict/ary   DataFrame, columns match original DataFrame
                      dict-of-funcs    Series      value             DataFrame, columns match dict keys, where dict keys must be columns in original DataFrame
                      list-of-funcs    Series      value             DataFrame, MultiIndex columns (original cols x func names)
    

    From the above table, if aggregation requires access to more than one
    column, the only option is to pass a single function to a
    DataFrameGroupBy object. Therefore, an alternate way to accomplish the original task is to define
    a function like the following:

    def ohlcsum(df):
        df = df.sort()
        return {
           'Open': df['Open'][0],
           'High': df['High'].max(),
           'Low': df['Low'].min(),
           'Close': df['Close'][-1],
           'Volume': df['Volume'].sum()
          }
    

    and apply agg() with it:

    In [30]: df.groupby(dr5minute.asof).agg(ohlcsum)
    Out[30]: 
                           Open    High     Low   Close  Volume
    key_0                                                      
    1999-01-04 10:20:00  1.1801  1.1819  1.1801  1.1806      34
    1999-01-04 10:25:00  1.1807  1.1815  1.1776  1.1789      91
    1999-01-04 10:30:00  1.1780  1.1792  1.1776  1.1791      16
    

    Though pandas may offer some cleaner built-in magic in the future, hopefully this explains how to work with today’s agg() capabilities.

    • 0
    • Reply
    • Share
      Share
      • Share on Facebook
      • Share on Twitter
      • Share on LinkedIn
      • Share on WhatsApp
      • Report

Sidebar

Related Questions

Could someone please point me in the right direction on how to allow the
could someone please point me in the right direction, I currently have a searchable
Could someone please point me toward a cleaner method to generate a random enum
Could someone please point out a site where I can find an algorithm to
Could someone please demystify interfaces for me or point me to some good examples?
Could someone who is familiar with webkit please explain or point me in the
Could someone please tell us on how to print correctly the handling thread in
Could someone please tell me which objects types can be tested using Regular Expressions
Could someone please explain the best way to connect to an Interbase 7.1 database
Could someone please explain? I couldn't find anything on the internet, everything talks about

Explore

  • Home
  • Add group
  • Groups page
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Users
  • Help
  • SEARCH

Footer

© 2021 The Archive Base. All Rights Reserved
With Love by The Archive Base

Insert/edit link

Enter the destination URL

Or link to existing content

    No search term specified. Showing recent items. Search or use up and down arrow keys to select an item.