I have two very large(30000+ documents) collections, one contains words extracted from a text

Question

0

Asked: May 25, 20262026-05-25T14:05:42+00:00 2026-05-25T14:05:42+00:00

I have two very large(30000+ documents) collections, one contains words extracted from a text

0

I have two very large(30000+ documents) collections, one contains words extracted from a text file(collection name ‘word’) and one contains words from a dictionary(collection name ‘dictionary’).

How can I get the words that exist in both collections?

(I’ve simplified the situation, documents inside the ‘word’ collection contain metadata about the words, so each word has to be a separate document.)

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-05-25T14:05:43+00:00

Copy both collections into a single collection (include a discriminator field if necessary so you can tell what kind of document you have in each instance).

Run map-reduce on that collection

In Map, emit the word as the key and a value, say {instance:1, dict:0} or {instance:0, dict:1} depending on whether the document being mapped is an instance or a dictionary entry. (You could add more fields here into the values as necessary.)

In Reduce, accumulate the scores (as usual).

Now do a query looking for instance > 0 and dict > 0 and you have all of the words that are in both.

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

I have two very large(30000+ documents) collections, one contains words extracted from a text

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply