I have a database of documents which are tagged with keywords. I am trying

Question

0

Asked: June 3, 20262026-06-03T05:40:12+00:00 2026-06-03T05:40:12+00:00

I have a database of documents which are tagged with keywords. I am trying

0

I have a database of documents which are tagged with keywords. I am trying to find (and then count) the unique tags which are used alongside each other. So for any given tag, I want to know what tags have been used alongside that tag.

For example, if I had one document which had the tags [fruit, apple, plant] then when I query [apple] I should get [fruit, plant]. If another document has tags [apple, banana] then my query for [apple] would give me [fruit, plant, banana] instead.

This is my map function which emits all the tags and their neighbours:

function(doc) {
  if(doc.tags) {
    doc.tags.forEach(function(tag1) {
      doc.tags.forEach(function(tag2) {
        emit(tag1, tag2);
      });
    });
  }
}

So in my example above, it would emit

apple -- fruit
apple -- plant
apple -- banana
fruit -- apple
fruit -- plant
...

My question is: what should my reduce function be? The reduce function should essentially filter out the duplicates and group them all together.

I have tried a number of different attempts, but my database server (CouchDB) keeps giving me a Error: reduce_overflow_error. Reduce output must shrink more rapidly.

EDIT: I’ve found something that seems to work, but I’m not sure why. I see there is an optional “rereduce” parameter to the reduce function call. If I ignore these special cases, then it stops throwing reduce_overflow_errors. Can anyone explain why? And also, should I just be ignoring these, or will this bite me in the ass later?

function(keys, values, rereduce) {
  if(rereduce) return null; // Throws error without this.

  var a = [];
  values.forEach(function(tag) {
    if(a.indexOf(tag) < 0) a.push(tag);
  });
  return a;
}

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-06-03T05:40:13+00:00

Your answer is nice, and as I said in the comments, if it works for you, that’s all you should care about. Here is an alternative implementation in case you ever bump into performance problems.

CouchDB likes tall lists, not fat lists. Instead of view rows keeping an array with every previous tag ever seen, this solution keeps the “sibling” tags in the key of the view rows, and then group them together to guarantee one unique sibling tag per row. Every row is just two tags, but there could be thousands or millions of rows: a tall list, which CouchDB prefers.

The main idea is to emit a 2-array of tag pairs. Suppose we have one doc, tagged fruit, apple, plant.

// Pseudo-code visualization of view rows (before reduce)
// Key         , Value
[apple, fruit ], 1
[apple, plant ], 1 // Basically this is every combination of 2 tags in the set.
[fruit, apple ], 1
[fruit, plant ], 1
[plant, apple ], 1
[plant, fruit ], 1

Next I tag something apple, banana.

// Pseudo-code visualization of view rows (before reduce)
// Key         , Value
[apple, banana], 1 // This is from my new doc
[apple, fruit ], 1
[apple, plant ], 1 // This is also from my new doc
[banana, apple], 1
[fruit, apple ], 1
[fruit, plant ], 1
[plant, apple ], 1
[plant, fruit ], 1

Why is the value always 1? Because I can make a very simple built-in reduce function: _sum to tell me the count of all tag pairs. Next, query with ?group_level=2 and CouchDB will give you unique pairs, with a count of their total.

A map function to produce this kind of view might look like this:

function(doc) {
  // Emit "sibling" tags, keyed on tag pairs.
  var tags = doc.tags || []
  tags.forEach(function(tag1) {
    tags.forEach(function(tag2) {
      if(tag1 != tag2)
        emit([tag1, tag2], 1)
    })
  })
}

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

I have a database of documents which are tagged with keywords. I am trying

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply