Sign Up

Sign Up to our social questions and Answers Engine to ask questions, answer people’s questions, and connect with other people.

Have an account? Sign In

Have an account? Sign In Now

Sign In

Login to our social questions & Answers Engine to ask questions answer people’s questions & connect with other people.

Sign Up Here

Forgot Password?

Don't have account, Sign Up Here

Forgot Password

Lost your password? Please enter your email address. You will receive a link and will create a new password via email.

Have an account? Sign In Now

You must login to ask a question.

Forgot Password?

Need An Account, Sign Up Here

Please briefly explain why you feel this question should be reported.

Please briefly explain why you feel this answer should be reported.

Please briefly explain why you feel this user should be reported.

Sign InSign Up

The Archive Base

The Archive Base Logo The Archive Base Logo

The Archive Base Navigation

  • SEARCH
  • Home
  • About Us
  • Blog
  • Contact Us
Search
Ask A Question

Mobile menu

Close
Ask a Question
  • Home
  • Add group
  • Groups page
  • Feed
  • User Profile
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Buy Points
  • Users
  • Help
  • Buy Theme
  • SEARCH
Home/ Questions/Q 7887779
In Process

The Archive Base Latest Questions

Editorial Team
  • 0
Editorial Team
Asked: June 3, 20262026-06-03T05:40:12+00:00 2026-06-03T05:40:12+00:00

I have a database of documents which are tagged with keywords. I am trying

  • 0

I have a database of documents which are tagged with keywords. I am trying to find (and then count) the unique tags which are used alongside each other. So for any given tag, I want to know what tags have been used alongside that tag.

For example, if I had one document which had the tags [fruit, apple, plant] then when I query [apple] I should get [fruit, plant]. If another document has tags [apple, banana] then my query for [apple] would give me [fruit, plant, banana] instead.

This is my map function which emits all the tags and their neighbours:

function(doc) {
  if(doc.tags) {
    doc.tags.forEach(function(tag1) {
      doc.tags.forEach(function(tag2) {
        emit(tag1, tag2);
      });
    });
  }
}

So in my example above, it would emit

apple -- fruit
apple -- plant
apple -- banana
fruit -- apple
fruit -- plant
...

My question is: what should my reduce function be? The reduce function should essentially filter out the duplicates and group them all together.

I have tried a number of different attempts, but my database server (CouchDB) keeps giving me a Error: reduce_overflow_error. Reduce output must shrink more rapidly.


EDIT: I’ve found something that seems to work, but I’m not sure why. I see there is an optional “rereduce” parameter to the reduce function call. If I ignore these special cases, then it stops throwing reduce_overflow_errors. Can anyone explain why? And also, should I just be ignoring these, or will this bite me in the ass later?

function(keys, values, rereduce) {
  if(rereduce) return null; // Throws error without this.

  var a = [];
  values.forEach(function(tag) {
    if(a.indexOf(tag) < 0) a.push(tag);
  });
  return a;
}
  • 1 1 Answer
  • 0 Views
  • 0 Followers
  • 0
Share
  • Facebook
  • Report

Leave an answer
Cancel reply

You must login to add an answer.

Forgot Password?

Need An Account, Sign Up Here

1 Answer

  • Voted
  • Oldest
  • Recent
  • Random
  1. Editorial Team
    Editorial Team
    2026-06-03T05:40:13+00:00Added an answer on June 3, 2026 at 5:40 am

    Your answer is nice, and as I said in the comments, if it works for you, that’s all you should care about. Here is an alternative implementation in case you ever bump into performance problems.

    CouchDB likes tall lists, not fat lists. Instead of view rows keeping an array with every previous tag ever seen, this solution keeps the “sibling” tags in the key of the view rows, and then group them together to guarantee one unique sibling tag per row. Every row is just two tags, but there could be thousands or millions of rows: a tall list, which CouchDB prefers.

    The main idea is to emit a 2-array of tag pairs. Suppose we have one doc, tagged fruit, apple, plant.

    // Pseudo-code visualization of view rows (before reduce)
    // Key         , Value
    [apple, fruit ], 1
    [apple, plant ], 1 // Basically this is every combination of 2 tags in the set.
    [fruit, apple ], 1
    [fruit, plant ], 1
    [plant, apple ], 1
    [plant, fruit ], 1
    

    Next I tag something apple, banana.

    // Pseudo-code visualization of view rows (before reduce)
    // Key         , Value
    [apple, banana], 1 // This is from my new doc
    [apple, fruit ], 1
    [apple, plant ], 1 // This is also from my new doc
    [banana, apple], 1
    [fruit, apple ], 1
    [fruit, plant ], 1
    [plant, apple ], 1
    [plant, fruit ], 1
    

    Why is the value always 1? Because I can make a very simple built-in reduce function: _sum to tell me the count of all tag pairs. Next, query with ?group_level=2 and CouchDB will give you unique pairs, with a count of their total.

    A map function to produce this kind of view might look like this:

    function(doc) {
      // Emit "sibling" tags, keyed on tag pairs.
      var tags = doc.tags || []
      tags.forEach(function(tag1) {
        tags.forEach(function(tag2) {
          if(tag1 != tag2)
            emit([tag1, tag2], 1)
        })
      })
    }
    
    • 0
    • Reply
    • Share
      Share
      • Share on Facebook
      • Share on Twitter
      • Share on LinkedIn
      • Share on WhatsApp
      • Report

Sidebar

Related Questions

I have a database of person documents. Each has a field named photos, which
I am saving documents to database, each document has to have an id with
Is is possible to query those documents from a Lotus Domino database which have
I have some legacy XML documents stored in a database as a blob, which
i have a database in couchdb which contain 4 documents... here the sample and
In my database I have documents which all contain the property foo . For
My application updates items on a database and then opens mail-merge documents which are
I have a 'small' problem. In a database documents contain a richtextfield. The richtextfield
I have an existing mongo database where all documents in a collection have a
I have a CouchDB database (v1.2.0) with documents like: { _id: pages/1, _rev: 15-56ad5a5e879206edb04a7a62105dd25d,

Explore

  • Home
  • Add group
  • Groups page
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Users
  • Help
  • SEARCH

Footer

© 2021 The Archive Base. All Rights Reserved
With Love by The Archive Base

Insert/edit link

Enter the destination URL

Or link to existing content

    No search term specified. Showing recent items. Search or use up and down arrow keys to select an item.