Sign Up

Sign Up to our social questions and Answers Engine to ask questions, answer people’s questions, and connect with other people.

Have an account? Sign In

Have an account? Sign In Now

Sign In

Login to our social questions & Answers Engine to ask questions answer people’s questions & connect with other people.

Sign Up Here

Forgot Password?

Don't have account, Sign Up Here

Forgot Password

Lost your password? Please enter your email address. You will receive a link and will create a new password via email.

Have an account? Sign In Now

You must login to ask a question.

Forgot Password?

Need An Account, Sign Up Here

Please briefly explain why you feel this question should be reported.

Please briefly explain why you feel this answer should be reported.

Please briefly explain why you feel this user should be reported.

Sign InSign Up

The Archive Base

The Archive Base Logo The Archive Base Logo

The Archive Base Navigation

  • SEARCH
  • Home
  • About Us
  • Blog
  • Contact Us
Search
Ask A Question

Mobile menu

Close
Ask a Question
  • Home
  • Add group
  • Groups page
  • Feed
  • User Profile
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Buy Points
  • Users
  • Help
  • Buy Theme
  • SEARCH
Home/ Questions/Q 6978771
In Process

The Archive Base Latest Questions

Editorial Team
  • 0
Editorial Team
Asked: May 27, 20262026-05-27T17:47:58+00:00 2026-05-27T17:47:58+00:00

Using two databases to illustrate this example: CouchDB and Cassandra . CouchDB CouchDB uses

  • 0

Using two databases to illustrate this example: CouchDB and Cassandra.

CouchDB

CouchDB uses a B+ Tree for document indexes (using a clever modification to work in their append-only environment) – more specifically as documents are modified (insert/update/delete) they are appended to the running database file as well as a full Leaf -> Node path from the B+ tree of all the nodes effected by the updated revision right after the document.

These piece-mealed index revisions are inlined right alongside the modifications such that the full index is a union of the most recent index modifications appended at the end of the file along with additional pieces further back in the data file that are still relevant and haven’t been modified yet.

Searching the B+ tree is O(logn).

Cassandra

Cassandra keeps record keys sorted, in-memory, in tables (let’s think of them as arrays for this question) and writes them out as separate (sorted) sorted-string tables from time to time.

We can think of the collection of all of these tables as the “index” (from what I understand).

Cassandra is required to compact/combine these sorted-string tables from time to time, creating a more complete file representation of the index.

Searching a sorted array is O(logn).

Question

Assuming a similar level of complexity between either maintaining partial B+ tree chunks in CouchDB versus partial sorted-string indices in Cassandra and given that both provide O(logn) search time which one do you think would make a better representation of a database index and why?

I am specifically curious if there is an implementation detail about one over the other that makes it particularly attractive or if they are both a wash and you just pick whichever data structure you prefer to work with/makes more sense to the developer.

Thank you for the thoughts.

  • 1 1 Answer
  • 0 Views
  • 0 Followers
  • 0
Share
  • Facebook
  • Report

Leave an answer
Cancel reply

You must login to add an answer.

Forgot Password?

Need An Account, Sign Up Here

1 Answer

  • Voted
  • Oldest
  • Recent
  • Random
  1. Editorial Team
    Editorial Team
    2026-05-27T17:47:58+00:00Added an answer on May 27, 2026 at 5:47 pm

    When comparing a BTree index to an SSTable index, you should consider the write complexity:

    • When writing randomly to a copy-on-write BTree, you will incur random reads (to do the copy of the leaf node and path). So while the writes my be sequential on disk, for datasets larger than RAM, these random reads will quickly become the bottle neck. For a SSTable-like index, no such read occurs on write – there will only be the sequential writes.

    • You should also consider that in the worse case, every update to a BTree could incur log_b N IOs – that is, you could end up writing 3 or 4 blocks for every key. If key size is much less than block size, this is extremely expensive. For an SSTable-like index, each write IO will contain as many fresh keys as it can, so the IO cost for each key is more like 1/B.

    In practice, this make SSTable-like thousands of times faster (for random writes) than BTrees.

    When considering implementation details, we have found it a lot easier to implement SSTable-like indexes (almost) lock-free, where as locking strategies for BTrees has become quite complicated.

    You should also re-consider your read costs. You are correct than a BTree is O(log_b N) random IOs for random point reads, but a SSTable-like index is actually O(#sstables . log_b N). Without an decent merge scheme, #sstables is proportional to N. There are various tricks to get round this (using Bloom Filters, for instance), but these don’t help with small, random range queries. This is what we found with Cassandra:

    Cassandra under heavy write load

    This is why Castle, our (GPL) storage engine, does merges slightly differently, and can achieve a lot better (O(log^2 N)) range queries performance with a slight trade off in write performance (O(log^2 N / B)). In practice we find it to be quicker than Cassandra’s SSTable index for writes as well.

    If you want to know more about this, I’ve given a talk about how it works:

    • podcast
    • slides
    • 0
    • Reply
    • Share
      Share
      • Share on Facebook
      • Share on Twitter
      • Share on LinkedIn
      • Share on WhatsApp
      • Report

Sidebar

Related Questions

I'm trying to move tables between two databases and I'm using this command that
I am using a simple join to pull data from two databases. This is
I’m using SSIS to synchronize data between two databases. I’ve used SSIS and DTS
Using Access 2003 I want to get a table value from the two databases
I am using SQL Server 2000 and I have two databases that both replicate
I've got two PostgreSQL databases that have been created using the same sql file.
I have two databases in MSSQL ,and i want to connect to them using
I am using Redis to store two databases : 0 and 1 via the
I have a Wordpress site that uses two databases -- one section queries one
i am trying to compare two databases on my SQL Server 2008 using the

Explore

  • Home
  • Add group
  • Groups page
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Users
  • Help
  • SEARCH

Footer

© 2021 The Archive Base. All Rights Reserved
With Love by The Archive Base

Insert/edit link

Enter the destination URL

Or link to existing content

    No search term specified. Showing recent items. Search or use up and down arrow keys to select an item.