I am still new to PHP and I was wondering which alternative would be better or maybe someone could suggest a better way.
I have a set of users and I have to track all of their interactions with posts. If a users taps on a button, it will add the post to a list and if they tap it again, it will remove the post, so would it be better to:
Have a column of a JSON array of postIDs stored in the table for each user (probably thousands).
-or-
Have a separate table with every save (combination of postID and userID) (probably millions) and return all results where the userID’s match?
For the purposes of this question, there are two tables: Table A is users and Table B is posts. How should I store all of the user’s saved posts?
EDIT: Sorry, but I didn’t mention that posts will have multiple user interactions and users will have multiple post interactions (Many to Many relationship). I think that would affect Bob’s answer.
This is an interesting question!
The solution really depends on your expected use case. If each user has a list of posts they’ve tagged, and that is all the information you need, it will be expedient to list these as a field in the user’s table (or in their blob if you’re using a nosql backend – a viable option if this is your use case!). There will be no impact on transmission time since the list will be the same size either way, but in this solution you will probably save on lookup time, since you’re only using one table and dbs will optimize to keep this information close together.
On the other hand, if you have to be able to query a given post for all the users that have tagged it, then option two will be much better. In the former method, you’d have to query all users and see if each one had the post. In this option, you simply have to find all the relations and work from there. Presumably you’d have a
usertable, aposttable and auser_posttable with foreign keys to the first two tables. There are other ways to do this, but it necessitates maintaining multiple lists and cross checking each time, which is an expensive set of operations and error-prone.Note that the latter option shouldn’t choke on ‘millions’ of connections, since the db should be optimized for this sort of quick read. (pro tip: index the proper columns!) Do be careful about any data massage, though. One unnecessary for-loop will kill your performance.