I am relatively new to cassandra and its data model. I have a large

Question

0

Asked: June 6, 20262026-06-06T07:54:14+00:00 2026-06-06T07:54:14+00:00

I am relatively new to cassandra and its data model. I have a large

0

I am relatively new to cassandra and its data model. I have a large set of data that are described by locations on chromosomes (chromosome:start-end) where we have 24 chromosomes and start and end are integers. The query I would like to support is to find all locations in the genome that overlap with a set of other locations. I can create a simple R-tree-based “indexing” scheme if there are not other ideas, but I thought someone might have run into this problem and come up with a solution.

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-06-06T07:54:18+00:00

As you need to query on 2 dimensions, either you could use other db like mongodb that support these kind of geospacial indexing/queries see Bounds Queries

In Cassandra, I think the best you could do is use geocell (doc) or other Space filling curves

you will convert start and end to a geohash, for each of your data, then you will be able to search for the bounding box, with start in [s1,s2] and end in [e1,e2], by searching geocells between geohash(s1, e1) and geohash(s2, e2) that gives contiguous locations in the bouding box

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

I am relatively new to cassandra and its data model. I have a large

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply