I’m creating an access control list for objects in my datastore. Each ACL entry could have a list of all user ids allowed to access the corresponding entry. Then my query to get the list of entities a user can access would be pretty simple:
select * from ACL where accessors = {userId} and searchTerms >= {search}
The problem is that this can only support 2500 users before it hits the index entry limit, and of course it would be very expensive to put an ACL entry with a lot of users because many index entries would need to be changed.
So I thought about adding a list of GROUPs of users that are allowed to access an entity. That could drastically lower the number of index entries needed for each ACL entry, but querying gets longer because I have to query for every possible group that a user is in:
select * from ACL where accessors = {userId} and searchTerms >= {search}
for (GroupId id : theSetOfGroupsTheUserBelongsTo) {
select * from ACL where accessingGroups = {id} and searchTerms >= {search}
}
mergeAllTheseResultsTogether()
which would take a long time, be much more difficult to page through, etc.
Can anyone recommend a way to fetch a list of entities from an ACL that doesn’t limit the number of accessing users?
Edit for more detail:
I’m searching and sorting on a long set of academic topics in use at a school. Some of the topics are created by administrators and should be school-wide. Others are created by teachers and are probably only relevant to those teachers. I want to create a google-docs-list-like hierarchy of collections that treats each topic like a document. The searchTerms field would be a list of words in the topic name – there is not a lot of internal text to search. Each topic will be in at least one collection (the organization’s “root” collection) and could be in as many as 10-20 other collections, all managed by different people. Ideally there’d be no upper limit to the number of collections a document might appear in. My struggle here is to produce a list of all of the entities a particular user has at least read access to – the analog in google docs would be the “All Items” view.
Assuming that your documents and group permissions change less often (or are less time critical) than user queries, I suggest this (which is how i’m solving a similar problem):
In your ACL, include the fields
The key_name for ACL would be something like
"indexed_document_id||index_num"index_numin the key allows you potentially have multiple entities storing the list of users, incase there are more than 5000 (the datastore limit on items in a list) or however many you want to have in a list to reduce the cost of loading one up (though you wont need to do that often).Don’t forget that the document to be accessed should be the parent of the index entity. that way you can do a
select __key__query rather than aselect *(this avoids having to deserialize the accessor and searchTerms fields). You can search and return the parent() of the entity without needing to access any of the fields. More on that and other gae search design at this blog post. Sadly that block post doesn’t cover ACL indexes like ours.Disclaimer: I’ve now encountered a problem with this design in that what document a user has access to is controlled by whether they are following that user. That means that if they follow or unfollow, there could be a large number of existing documents the user needs to be added/removed from. If this is the case for you, then you might be stuck in the same hole as me if you follow my technique. I currently plan to handle this by updating the indexes for old documents in the background, over time. Someone else answering this question might have a solution to it baked in – if not I may post it as a separate question.
Analysis of operations on this datastructure:
Add an indexed document:
O(n*m) where n is number of users and m is number of search queries
Query an indexed document:
select __key__ from ACL where accessors = {userid} and searchTerms >= {search}(though i’m not sure why you do “>=” actually, in my queries it’s always “=”)O(n+m) where n is the number of users and m is the number of search terms – this is pretty fast. it uses the zig-zag merge join of two indexes (one on accessors, one on searchterms). this assumes that gae index scans are linear. they might be logarithmic for “=” queries but i’m not privy to the design of their indexes nor have i done any tests to verify. note also that you dont need to load any of the properties of the index entity.
Add access for a user to a particular document
select __key__ from ACL where accessor = {userid} and parent = {key(document)}select * from ACL where parent = {key(document)} and numberOfAccessors < {5000 (or whatever your max is)} limit 1O(n) where n is the number of people who have access to the document.
Remove access for a user to a particular document
select * from ACL where accessor = {userid} and parent = {key(document)}O(n) where n is the number of people who have access to the document.
Compact the indexes
You’ll have to do this once in a while if you do a lot of removals. not sure the best way to detect this.
select * from ACL where parent = {key(document)} and numberOfAccessors < {2500 (or half wahtever your max is)}O(n) where n is the number of people who have access to the document