According to Riak’s docs (using Python bindings), get_keys() is extremely expensive and not suitable

Question

0

Asked: May 25, 20262026-05-25T22:00:32+00:00 2026-05-25T22:00:32+00:00

According to Riak’s docs (using Python bindings), get_keys() is extremely expensive and not suitable

0

According to Riak’s docs (using Python bindings), get_keys() is extremely expensive and not suitable for production. My question is whether a very simple map query is suitable. For instance, using a map stage only with the function:

function(v) { return [v.key]; }

is this going to perform better than get_keys()? why wouldn’t Riak ship with this implementation instead of the current version of get_keys()? Is there a better way I should be listing keys for a bucket?

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-05-25T22:00:33+00:00

The get_keys() function calls list_keys in the back end and is considered to be an expensive operation because it performs a full scan of the key space. Depending on your Riak back end, this could also involve a full scan of the data as stored on disk (InnoStore springs to mind). The default storage back end (Bitcask) stores all of your keys in memory, so performance shouldn’t be as much of a problem.

The other reason list_keys was considered expensive is because it was formerly a blocking operation as it involved what the Basho developers refer to as a ‘fold’ over all of the keys. list_keys now uses a snapshot of the bucket (instead of reading the live key space) and this makes it a lighter weight operation as well.

This is made easier with an upgrade to Riak 1.0. If you’re using the LevelDB back end, you can enable secondary indexes on a bucket and use the $key index (automatically provided by Riak) to get you a list of all keys in a bucket.

As for why Riak doesn’t ship with a better implementation of something like this: ask what the functionality is for. In an RDBMS, getting all primary keys of a table involves a full table scan. In Riak, getting all keys from a bucket requires scanning all data in every node and then shipping the key names back to the originating node, combining that data, and then sending it to the calling client. Because of Riak’s distributed, unordered, state this operation is expensive no matter how you slice it. There are, as I outlined above, ways to make it better.

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

According to Riak’s docs (using Python bindings), get_keys() is extremely expensive and not suitable

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply