So here is the Problem:
I have document in collection A, when it is first created it is not referenced by any other documents. At some point a document in collection Bwill be created and it will reference the ObjectId of a document in collection A.
What is the best way to find all Documents in Collection A that aren’t referenced by I document in collection B?
I understand MongoDB doesn’t support joins, but I wonder if there is a solution to this problem other than getting all referenced ObjectIds from Collection B and finding documents in collection A that aren’t in that list, as this solution likely wouldn’t scale well.
Can I just embed the document from Collection A into the document from collection B and then remove it from Collection A? Is that the best solution?
Thanks for your help and comments.
Lots of options:
1) Add the id of the B document to an array in the A document (a reverse reference). Now you can look for A documents that don’t have any elements in that array. Issue: array may get too large for document size if you have lots of cross references.
2) Add a collection C that tracks references between A’s and B’s. Behaves like a join table.
3) Have a simple flag in A ‘referenced’. When you add a B mark all of the A’s that it references as ‘referenced’. When you remove a B, do a scan of B for all of the A’s that are referenced by it and unflag any A’s that no longer have a reference. Issue: could get out of sync.
4) Use map reduce on B to create a collection containing the ids of all the A’s that are referenced by any B. Use that collection to mark all the A’s that are referenced (after unmarking all of them first). Can use this to fix (3) periodically.
5) Put both document types in the same collection and use map reduce to emit the _id and a flag to say ‘in A’ or ‘referenced by B’. In the reduce step look for any groups that have ‘in A’ but not ‘referenced by B’.
…