I am not facing this issue , but always comes to my mind .. Of course this is only after replicating data , using memcached and partitioning …
If I have photo_tbl and structure is like below
user_id
group_id
date_added
.... and many more
on user profile we show user photo by running below query
SELECT ...... FROM photo_tbl WHERE user_id=? order by date_added desc
on groups page we show groups photos by running below query
SELECT ...... FROM photo_tbl WHERE group_id=? order by date_added desc
In this case if the rows are billions and if requires sharding on what key do you use without impacting performance for the above two queries ..?
If my sharded key is user_id , for groups I have to go multiple databases to get the desired results ( by changing application logic) .. if it is on group_id for users profile , I have to go to multiple databases to get the desired results.
You basically have two “shard trees”. You need to shard by user and by group. If you attempt this in a single table then one way will always require a query across all shards, which isn’t so bad if you have an efficient way of doing this. For instance, with dbShards you can run efficient queries in parallel across shards (we call these “Go Fish” queries).
There are two other options to consider:
Duplicate the table and have one sharded by user and one sharded by group. All reads will be against a single shard but you have to write twice.
Use three tables. Photo table sharded by photo id. user_photos (user_id, photo_id, and other fields) sharded by user. group_photos (group_id, photo_id, and other fields) sharded by group.
We see these scenarios a lot and these are the usual approaches our customers take.