I am using a Posts and TaggedPosts column families like shown in this example

Question

0

Editorial Team

Asked: June 3, 20262026-06-03T11:47:53+00:00 2026-06-03T11:47:53+00:00

I am using a Posts and TaggedPosts column families like shown in this example

0

I am using a Posts and TaggedPosts column families like shown in this example

I would like to be able to find Posts Tagged with tags ‘A’, ‘B’ and ‘C’ (for the example)

the problem is i have to read entirely TaggedPosts with key A, and not just fetch first 10 results as it’s shown in example, then intersect with all TaggedPosts with key B, to not miss one and so on

It’s super inefficient, what would be your advise ion order to do this?

I was thinking to change TaggedPosts structure: and put Posts ids as Rows keys and

create colmun familty TaggedPosts with ... and column_metadata=[
    {column_name: tag1, ..., index_type: KEYS},
    {column_name: tag2, ..., index_type: KEYS},
    {column_name: tag3, ..., index_type: KEYS},

and do:

get TaggedPosts where tag1=A and tag2=B and tag3=C;

but not sure it would be much more effective, than intersecting/filtering client-side

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-06-03T11:48:00+00:00

I think the ideal schema for your case would depend on how often you need to perform that intersecting query, and whether you need to be able to get quick results for any arbitrary pair of tags, or for any arbitrary set of N tags, or whether you’ll only need to do that with certain, limited tags.

If, as I suspect, you want to be able to query for posts matching any arbitrary set of tags, there may not be any better solution than to have a schema like this (cql3):

CREATE COLUMNFAMILY TaggedPosts (
    tag text,
    post uuid,
    blog_rowentries_rowkey text,
    PRIMARY KEY (tag, post)
) WITH COMPACT STORAGE;

-- (note that this is the same actual data layout used in the "wtf is a supercolumn" article)

And then query for “posts with A”, “posts with B”, etc. like this:

SELECT * FROM TaggedPosts WHERE tag = 'A' LIMIT 100;
SELECT * FROM TaggedPosts WHERE tag = 'B' LIMIT 100;

..so they’re individually queried, and then you merge the results client-side. The limit there of 100 may not be ideal for your data; the ideal value depends on how likely your tags are to overlap. It’s not intended to guarantee that you get all the results you want, obviously, it’s just a batch size. If you don’t find enough posts matching all tags, you query for more batches from the tags with the lowest uuid-times until you do.

You could do a lot better than this in terms of efficiency and ease-of-coding with a Solr index, since this is more of a full-text-search kind of a problem, but you’d need Datastax Enterprise or some other way to integrate Solr yourself. (Disclaimer: I work for Datastax.)

Best advice I can give on the topic, though, is not to use supercolumns.

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

I am using a Posts and TaggedPosts column families like shown in this example

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply