I have some database structure; as most of it is irrelevant for us, i’ll describe just some relevant pieces. Let’s lake Item object as example:
items_table = Table("invtypes", gdata_meta,
Column("typeID", Integer, primary_key = True),
Column("typeName", String, index=True),
Column("marketGroupID", Integer, ForeignKey("invmarketgroups.marketGroupID")),
Column("groupID", Integer, ForeignKey("invgroups.groupID"), index=True))
mapper(Item, items_table,
properties = {"group" : relation(Group, backref = "items"),
"_Item__attributes" : relation(Attribute, collection_class = attribute_mapped_collection('name')),
"effects" : relation(Effect, collection_class = attribute_mapped_collection('name')),
"metaGroup" : relation(MetaType,
primaryjoin = metatypes_table.c.typeID == items_table.c.typeID,
uselist = False),
"ID" : synonym("typeID"),
"name" : synonym("typeName")})
I want to achieve some performance improvements in the sqlalchemy/database layer, and have couple of ideas:
1) Requesting the same item twice:
item = session.query(Item).get(11184) item = None (reference to item is lost, object is garbage collected) item = session.query(Item).get(11184)
Each request generates and issues SQL query. To avoid it, i use 2 custom maps for an item object:
itemMapId = {}
itemMapName = {}
@cachedQuery(1, "lookfor")
def getItem(lookfor, eager=None):
if isinstance(lookfor, (int, float)):
id = int(lookfor)
if eager is None and id in itemMapId:
item = itemMapId[id]
else:
item = session.query(Item).options(*processEager(eager)).get(id)
itemMapId[item.ID] = item
itemMapName[item.name] = item
elif isinstance(lookfor, basestring):
if eager is None and lookfor in itemMapName:
item = itemMapName[lookfor]
else:
# Items have unique names, so we can fetch just first result w/o ensuring its uniqueness
item = session.query(Item).options(*processEager(eager)).filter(Item.name == lookfor).first()
itemMapId[item.ID] = item
itemMapName[item.name] = item
return item
I believe sqlalchemy does similar object tracking, at least by primary key (item.ID). If it does, i can wipe both maps (although wiping name map will require minor modifications to application which uses these queries) to not duplicate functionality and use stock methods. Actual question is: if there’s such functionality in sqlalchemy, how to access it?
2) Eager loading of relationships often helps to save alot of requests to database. Say, i’ll definitely need following set of item=Item() properties:
item.group (Group object, according to groupID of our item) item.group.items (fetch all items from items list of our group) item.group.items.metaGroup (metaGroup object/relation for every item in the list)
If i have some item ID and no item is loaded yet, i can request it from the database, eagerly loading everything i need: sqlalchemy will join group, its items and corresponding metaGroups within single query. If i’d access them with default lazy loading, sqlalchemy would need to issue 1 query to grab an item + 1 to get group + 1*#items for all items in the list + 1*#items to get metaGroup of each item, which is wasteful.
2.1) But what if i already have Item object fetched, and some of the properties which i want to load are already loaded? As far as i understand, when i re-fetch some object from the database – its already loaded relations do not become unloaded, am i correct?
2.2) If i have Item object fetched, and want to access its group, i can just getGroup using item.groupID, applying any eager statements i’ll need (“items” and “items.metaGroup”). It should properly load group and its requested relations w/o touching item stuff. Will sqlalchemy properly map this fetched group to item.group, so that when i access item.group it won’t fetch anything from the underlying database?
2.3) If i have following things fetched from the database: original item, item.group and some portion of the items from the item.group.items list some of which may have metaGroup loaded, what would be best strategy for completing data structure to the same as eager list above: re-fetch group with (“items”, “items.metaGroup”) eager load, or check each item from items list individually, and if item or its metaGroup is not loaded – load them? It seems to depend on the situation, because if everything has already been loaded some time ago – issuing such heavy query is pointless. Does sqlalchemy provide a way to track if some object relation is loaded, with the ability to look deeper than just one level?
As an illustration to 2.3 – i can fetch group with ID 83, eagerly fetching “items” and “items.metaGroup”. Is there a way to determine from an item (which has groupID of an 83), does it have “group”, “group.items” and “group.items.metaGroup” loaded or not, using sqlalchemy tools (in this case all of them should be loaded)?
To force loading lazy attributes just access them. This the simplest way and it works fine for relations, but is not as efficient for
Columns (you will get separate SQL query for each column in the same table). You can get a list of all unloaded properties (both relations and columns) fromsqlalchemy.orm.attributes.instance_state(obj).unloaded.You don’t use deferred columns in your example, but I’ll describe them here for completeness. The typical scenario for handling deferred columns is the following:
deferred(). Combine them into one or several groups by usinggroupparameter todeferred().undefer()andundefer_group()options in query when desired.Unfortunately this doesn’t work reverse: you can combine columns into groups without deferring loading of them by default with
column_property(Column(…), group=…), butdefer()option won’t affect them (it works forColumns only, not column properties, at least in 0.6.7).To force loading deferred column properties
session.refresh(obj, attribute_names=…)suggested by Nathan Villaescusa is probably the best solution. The only disadvantage I see is that it expires attributes first so you have to insure there is not loaded attributes among passed asattribute_namesargument (e.g. by using intersection withstate.unloaded).Update
1) SQLAlchemy does track loaded objects. That’s how ORM works: there must be the only object in the session for each identity. Its internal cache is weak by default (use
weak_identity_map=Falseto change this), so the object is expunged from the cache as soon as there in no reference to it in your code. SQLAlchemy won’t do SQL request forquery.get(pk)when object is already in the session. But this works forget()method only, soquery.filter_by(id=pk).first()will do SQL request and refresh object in the session with loaded data.2) Eager loading of relations will lead to fewer requests, but it’s not always faster. You have to check this for your database and data.
2.1) Refetching data from database won’t unload objects bound via relations.
2.2)
item.groupis loaded usingquery.get()method, so there won’t lead to SQL request if object is already in the session.2.3) Yes, it depends on situation. For most cases it’s the best is to hope SQLAlchemy will use the right strategy :). For already loaded relation you can check if related objects’ relations are loaded via
state.unloadedand so recursively to any depth. But when relation is not loaded yet you can’t get know whether related objects and their relations are already loaded: even when relation is not yet loaded the related object[s] might be already in the session (just imagine you request first item, load its group and then request other item that has the same group). For your particular example I see no problem to just checkstate.unloadedrecursively.