the following bit of code is run regularly as a cronjob and is turning out to be very computationally expensive! The main problem is in the for loop, and I think this can be made a little more efficient using better filtering, however I’m at a loss as to how I can do that.
free_membership_type = MembershipType.all().filter("membership_class =", "Free").filter("live =", True).get()
all_free_users = UserMembershipType.all().filter("membership_active =", True)
all_free_users = all_free_users.filter("membership_type =", free_membership_type).fetch(limit = 999999)
if all_free_users:
for free_user in all_free_users:
activation_status = ActivationStatus.all().filter("user = ", free_user.user).get()
if activation_status and activation_status.activated:
documents_left = WeeklyLimits.all().filter("user = ", free_user.user).get()
if documents_left > 0:
do something...
The models which the code uses are:
class MembershipType(db.Model):
membership_class = db.StringProperty()
membership_code = db.StringProperty()
live = db.BooleanProperty(default = False)
class UserMembershipType(db.Model):
user = db.ReferenceProperty(UserModel)
membership_type = db.ReferenceProperty(MembershipType)
membership_active = db.BooleanProperty(default = False)
class ActivationStatus(db.Model):
user = db.ReferenceProperty(UserModel)
activated = db.BooleanProperty(default = False)
class WeeklyLimits(db.Model):
user = db.ReferenceProperty(UserModel)
membership_type = db.ReferenceProperty(MembershipType)
documents_left = db.IntegerProperty(default = 0)
The code I’m using in production does make better use of caching for the various entities, however the for loop still has to cycle through a bunch of users to finally find the few that it needs to do the operation on. Ideally I’d filter out all of the users that don’t fulfil the criteria and only then start looping through the list of users – is there some kind of magic bullet that I can use here to achieve this?
The magic that you are probably looking for is denormalization. It looks to me like these classes can all be meaningfully combined into a single model:
Then, you can use one query to do all of your filtering.
Over-normalization is a common anti-pattern in AppEngine development. The models that you posted look like they might as well be table definitions for a relational database (although, it’s arguable whether its more compartmentalized than needed even for that scenario) and AppEngine’s datastore is very much not a relational database.
Can you see any downside to storing all of those fields in a single model?