Sign Up

Sign Up to our social questions and Answers Engine to ask questions, answer people’s questions, and connect with other people.

Have an account? Sign In

Have an account? Sign In Now

Sign In

Login to our social questions & Answers Engine to ask questions answer people’s questions & connect with other people.

Sign Up Here

Forgot Password?

Don't have account, Sign Up Here

Forgot Password

Lost your password? Please enter your email address. You will receive a link and will create a new password via email.

Have an account? Sign In Now

You must login to ask a question.

Forgot Password?

Need An Account, Sign Up Here

Please briefly explain why you feel this question should be reported.

Please briefly explain why you feel this answer should be reported.

Please briefly explain why you feel this user should be reported.

Sign InSign Up

The Archive Base

The Archive Base Logo The Archive Base Logo

The Archive Base Navigation

  • SEARCH
  • Home
  • About Us
  • Blog
  • Contact Us
Search
Ask A Question

Mobile menu

Close
Ask a Question
  • Home
  • Add group
  • Groups page
  • Feed
  • User Profile
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Buy Points
  • Users
  • Help
  • Buy Theme
  • SEARCH
Home/ Questions/Q 583917
In Process

The Archive Base Latest Questions

Editorial Team
  • 0
Editorial Team
Asked: May 13, 20262026-05-13T14:50:37+00:00 2026-05-13T14:50:37+00:00

How to do this on Google App Engine (Python): SELECT COUNT(DISTINCT user) FROM event

  • 0

How to do this on Google App Engine (Python):

SELECT COUNT(DISTINCT user) FROM event WHERE event_type = "PAGEVIEW" 
AND t >= start_time AND t <= end_time

Long version:

I have a Python Google App Engine application with users that generate events, such as pageviews. I would like to know in a given timespan how many unique users generated a pageview event. The timespan I am most interested in is one week, and there are about a million such events in a given week. I want to run this in a cron job.

My event entities look like this:

class Event(db.Model):
    t = db.DateTimeProperty(auto_now_add=True)
    user = db.StringProperty(required=True)
    event_type = db.StringProperty(required=True)

With an SQL database, I would do something like

SELECT COUNT(DISTINCT user) FROM event WHERE event_type = "PAGEVIEW" 
AND t >= start_time AND t <= end_time

First thought that occurs is to get all PAGEVIEW events and filter out duplicate users. Something like:

query = Event.all()
query.filter("t >=", start_time)
query.filter("t <=", end_time)
usernames = []
for event in query:
    usernames.append(event.user)
answer = len(set(usernames))

But this won’t work, because it will only support up to 1000 events. Next thing that occurs to me is to get 1000 events, then when those run out get the next thousand and so on. But that won’t work either, because going through a thousand queries and retrieving a million entities would take over 30 seconds, which is the request time limit.

Then I thought I should ORDER BY user to faster skip over duplicates. But that is not allowed because I am already using the inequality “t >= start_time AND t <= end_time”.

It seems clear this cannot be accomplished under 30 seconds, so it needs to be fragmented. But finding distinct items seems like it doesn’t split well into subtasks. Best I can think of is on every cron jobcall to find 1000 pageview events and then get distinct usernames from those, and put them in an entity like Chard. It could look something like

class Chard(db.Model):
    usernames = db.StringListProperty(required=True)

So each chard would have up to 1000 usernames in it, less if there were duplicates that got removed. After about a 16 hours (which is fine) I would have all the chards and could do something like:

chards = Chard.all()
all_usernames = set()
for chard in chards:
    all_usernames = all_usernames.union(chard.usernames)
answer = len(all_usernames)

It seems like it might work, but hardly a beautiful solution. And with enough unique users this loop might take too long. I haven’t tested it in hopes someone will come up with a better suggestion, so not if this loop would turn out to be fast enough.

Is there any prettier solution to my problem?

Of course all of this unique user counting could be accomplished easily with Google Analytics, but I am constructing a dashboard of application specific metrics, and intend this to be the first of many stats.

  • 1 1 Answer
  • 0 Views
  • 0 Followers
  • 0
Share
  • Facebook
  • Report

Leave an answer
Cancel reply

You must login to add an answer.

Forgot Password?

Need An Account, Sign Up Here

1 Answer

  • Voted
  • Oldest
  • Recent
  • Random
  1. Editorial Team
    Editorial Team
    2026-05-13T14:50:37+00:00Added an answer on May 13, 2026 at 2:50 pm

    Here is a possibly-workable solution. It relies to an extent on using memcache, so there is always the possibility that your data would get evicted in an unpredictable fashion. Caveat emptor.

    You would have a memcache variable called unique_visits_today or something similar. Every time that a user had their first pageview of the day, you would use the .incr() function to increment that counter.

    Determining that this is the user’s first visit is accomplished by looking at a last_activity_day field attached to the user. When the user visits, you look at that field, and if it is yesterday, you update it to today and increment your memcache counter.

    At midnight each day, a cron job would take the current value in the memcache counter and write it to the datastore while setting the counter to zero. You would have a model like this:

    class UniqueVisitsRecord(db.Model):
        # be careful setting date correctly if processing at midnight
        activity_date = db.DateProperty()
        event_count = IntegerProperty()
    

    You could then simply, easily, quickly get all of the UnqiueVisitsRecords that match any date range and add up the numbers in their event_count fields.

    • 0
    • Reply
    • Share
      Share
      • Share on Facebook
      • Share on Twitter
      • Share on LinkedIn
      • Share on WhatsApp
      • Report

Sidebar

Ask A Question

Stats

  • Questions 384k
  • Answers 384k
  • Best Answers 0
  • User 1
  • Popular
  • Answers
  • Editorial Team

    How to approach applying for a job at a company ...

    • 7 Answers
  • Editorial Team

    How to handle personal stress caused by utterly incompetent and ...

    • 5 Answers
  • Editorial Team

    What is a programmer’s life like?

    • 5 Answers
  • Editorial Team
    Editorial Team added an answer It is a common request, check out this project. Find… May 14, 2026 at 11:18 pm
  • Editorial Team
    Editorial Team added an answer If your project is written in one of the languages… May 14, 2026 at 11:18 pm
  • Editorial Team
    Editorial Team added an answer Conceptually there are a couple obvious ways to achieve this.… May 14, 2026 at 11:18 pm

Trending Tags

analytics british company computer developers django employee employer english facebook french google interview javascript language life php programmer programs salary

Top Members

Explore

  • Home
  • Add group
  • Groups page
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Users
  • Help
  • SEARCH

Footer

© 2021 The Archive Base. All Rights Reserved
With Love by The Archive Base

Insert/edit link

Enter the destination URL

Or link to existing content

    No search term specified. Showing recent items. Search or use up and down arrow keys to select an item.