I am using a Posts and TaggedPosts column families like shown in this example
I would like to be able to find Posts Tagged with tags ‘A’, ‘B’ and ‘C’ (for the example)
the problem is i have to read entirely TaggedPosts with key A, and not just fetch first 10 results as it’s shown in example, then intersect with all TaggedPosts with key B, to not miss one and so on
It’s super inefficient, what would be your advise ion order to do this?
I was thinking to change TaggedPosts structure: and put Posts ids as Rows keys and
create colmun familty TaggedPosts with ... and column_metadata=[
{column_name: tag1, ..., index_type: KEYS},
{column_name: tag2, ..., index_type: KEYS},
{column_name: tag3, ..., index_type: KEYS},
and do:
get TaggedPosts where tag1=A and tag2=B and tag3=C;
but not sure it would be much more effective, than intersecting/filtering client-side
I think the ideal schema for your case would depend on how often you need to perform that intersecting query, and whether you need to be able to get quick results for any arbitrary pair of tags, or for any arbitrary set of N tags, or whether you’ll only need to do that with certain, limited tags.
If, as I suspect, you want to be able to query for posts matching any arbitrary set of tags, there may not be any better solution than to have a schema like this (cql3):
And then query for “posts with A”, “posts with B”, etc. like this:
..so they’re individually queried, and then you merge the results client-side. The limit there of 100 may not be ideal for your data; the ideal value depends on how likely your tags are to overlap. It’s not intended to guarantee that you get all the results you want, obviously, it’s just a batch size. If you don’t find enough posts matching all tags, you query for more batches from the tags with the lowest uuid-times until you do.
You could do a lot better than this in terms of efficiency and ease-of-coding with a Solr index, since this is more of a full-text-search kind of a problem, but you’d need Datastax Enterprise or some other way to integrate Solr yourself. (Disclaimer: I work for Datastax.)
Best advice I can give on the topic, though, is not to use supercolumns.