I have a model called User, and a user has a property relatedUsers, which, in its general format, is an array of integers. Now, there will be times when I want to check if a certain number exists in a User’s relatedUsers array. I see two ways of doing this:
-
Use a standard Python list with indexed values (or maybe not) and just run an IN query and see if that number is in there.
-
Having the key to that User, get back the value for property relatedUsers, which is an array in JSON string format. Decode the string, and check if the number is in there.
Which one is more efficient? Would number 1 cost more reads than option 2? And would number 1 writes cost more than number 2, since indexing each value costs a write. What if I don’t index — which solution would be better then?
Here’s your costs vs capability, option wise:
Putting the values in an indexed list will be far more expensive. You will incur the cost of one write for each value in the list, which can explode depending on how many friends your users have. It’s possible for this cost explosion to be worse if you have certain kinds of composite indexes. The good side is that you get to run queries on this information: you can get query for a list of users who are friends with a particular user, for example.
No extra index or write costs here. The problem is that you lose querying functionality.
If you know that you’re only going to be doing checks only on the current user’s list of friends, by all means go with option 2. Otherwise you might have to look at your design a little more carefully.