I’m working on automating some reports via pandas and the Google Analytics API. When

Question

0

Asked: June 18, 20262026-06-18T14:08:25+00:00 2026-06-18T14:08:25+00:00

I’m working on automating some reports via pandas and the Google Analytics API. When

0

I’m working on automating some reports via pandas and the Google Analytics API. When requesting several dimensions for the data to be split out by, the resulting recordset is well above the default 10k max_result limit imposed by pandas.

To get around this, I’m passing in a large number for the max_results parameter and specifying a chunksize. My intention is to then iterate over the resulting generator to create one large DataFrame which I can do all of my operations on.

from pandas.io import ga
import pandas as pd

max_results = 1000000
chunks = ga.read_ga(metrics=["visits"],
                    dimensions=["date", "browser", "browserVersion",
                    "operatingSystem", "operatingSystemVersion",
                    "isMobile", "mobileDeviceInfo"],
                    start_date="2012-12-01",
                    end_date="2012-12-31",
                    max_results=max_results,
                    chunksize=5000)

stats = pd.concat([chunk for chunk in chunks])
stats.groupby(level="date").sum()

However, it’s clear that some records aren’t being pulled as the overall daily sum of visits does not match Google Analytics.

I do not run into this issue when selecting only a couple dimensions. For instance …

test = ga.read_ga(metrics=["visits"], dimensions=["date"],
            start_date="2012-12-01", end_date="2012-12-31")

test.groupby(level="date").sum()

… produces the same numbers as Google Analytics.

Thanks in advance for the help.

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-06-18T14:08:26+00:00

The 10000 rows total is a limit imposed by the google analytics API (https://developers.google.com/analytics/devguides/reporting/core/v3/reference#maxResults)

The code uses the start_index to make multiple requests and work around the limit. I marked this as a bug in pandas: https://github.com/pydata/pandas/issues/2805
I’ll take a look whenever I get a chance. If you could show some expected data vs what you get via pandas that’d be helpful.

As a workaround, I would suggest iterating over each day and making a daily request.

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

I’m working on automating some reports via pandas and the Google Analytics API. When

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply