I’m reading the documentation for Google App Engine and stumbled upon something that I don’t quite understand:
The Datastore uses optimistic concurrency to manage transactions. When
two or more application instances try to change the same entity group
at the same time (either by updating existing entities or by creating
new ones), the first application to commit its changes will succeed
and all others will fail on commit. These other applications can then
try their transactions again to apply them to the updated data. Note
that because the Datastore works this way, using entity groups limits
the number of concurrent writes you can do to any entity in a given
group.
Does this mean that if two different users from two different devices try to modify the same object, only one of them will succeed? Is this typical database behavior, or just a GAE limitation? How do other databases usually handle such situations, where two or more users try to modify the same object?
And what does it mean by the fact that when two or more application instances try to create new entities, only one will succeed. Am I understanding that wrong? No two application instances can add a new object to the same table?
While I can’t speak for document databases like MongoDB and the like (aka NoSQL), I can tell you that relational databases would only allow one operation to take effect. However, this comes down to what the operations are.
For example, say two users tried to modify the same object. If their modifications only modify a subset of columns, say…
User 1:
User 2:
You can be certain that
Col1will be ‘1’ andCol3' will be 'y', as those two columns were only updated in one statement. The value ofCol2` will be determined by whichever command executed last.Likewise, if one user updated the row and another user deleted it, then the row would be deleted no matter what. If the update user’s command came first, then the update would succeed and the row would then be deleted. If the delete command came first, then the row would first be deleted and the update would not do anything since the row doesn’t exist (the
whereclause would not match any rows).However, very few applications actually bother to issue updates to the database with commands that only include changed columns. In almost all applications, commands are created at the table level and they update all columns, then the “current” (changed or not) values are passed into these commands. This is the reason for using optimistic concurrency.
Assuming that row
abccurrently has the values:And that both users retrieved the row at the same time, our above commands would more realistically look like this:
User 1:
User 2:
Now we have a problem; even though our second command really doesn’t care about the value in
Col1, it may overwrite the value that was set by User 1. Likewise, if User 2 hit first, then User 1 would overwrite the value written toCol3by User 2.Optimistic concurrency essentially expands the
whereclause to check the value of every column, not just the key of the table. This way you can be sure that you aren’t overwriting any changes made by someone (or something) else in the time between when you retrieved a row and when you saved it back.So, given the same conditions, our commands would look like:
User 1:
User 2:
This means that whichever command hits the database last won’t actually do anything, since the columns will no longer have their original values.