Sign Up

Sign Up to our social questions and Answers Engine to ask questions, answer people’s questions, and connect with other people.

Have an account? Sign In

Have an account? Sign In Now

Sign In

Login to our social questions & Answers Engine to ask questions answer people’s questions & connect with other people.

Sign Up Here

Forgot Password?

Don't have account, Sign Up Here

Forgot Password

Lost your password? Please enter your email address. You will receive a link and will create a new password via email.

Have an account? Sign In Now

You must login to ask a question.

Forgot Password?

Need An Account, Sign Up Here

Please briefly explain why you feel this question should be reported.

Please briefly explain why you feel this answer should be reported.

Please briefly explain why you feel this user should be reported.

Sign InSign Up

The Archive Base

The Archive Base Logo The Archive Base Logo

The Archive Base Navigation

  • SEARCH
  • Home
  • About Us
  • Blog
  • Contact Us
Search
Ask A Question

Mobile menu

Close
Ask a Question
  • Home
  • Add group
  • Groups page
  • Feed
  • User Profile
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Buy Points
  • Users
  • Help
  • Buy Theme
  • SEARCH
Home/ Questions/Q 424889
In Process

The Archive Base Latest Questions

Editorial Team
  • 0
Editorial Team
Asked: May 12, 20262026-05-12T19:19:29+00:00 2026-05-12T19:19:29+00:00

I am trying to figure out exactly what these new fangled data stores such

  • 0

I am trying to figure out exactly what these new fangled data stores such as bigtable, hbase and cassandra really are.

I work with massive amounts of stock market data, billions of rows of price/quote data that can add up to 100s of gigabytes every day (although these text files often compress by at least an order of magnitude). This data is basically a handful of numbers, two or three short strings and a timestamp (usually millisecond level). If I had to pick a unique identifier for each row, I would have to pick the whole row (since an exchange may generate multiple values for the same symbol in the same millisecond).

I suppose the simplest way to map this data to bigtable (I’m including its derivatives) is by symbol name and date (which may return a very large time series, more than million data points isn’t unheard of). From reading their descriptions, it looks like multiple keys can be used with these systems. I’m also assuming that decimal numbers are not good candidates for keys.

Some of these systems (Cassandra, for example) claims to be able to do range queries. Would I be able to efficiently query, say, all values for MSFT, for a given day, between 11:00 am and 1:30 pm ?

What if I want to search across ALL symbols for a given day, and request all symbols that have a price between $10 and $10.25 (so I’m searching the values, and want keys returned as a result)?

What if I want to get two times series, subtract one from the other, and return the two times series and their result, will I have to do his logic in my own program?

Reading relevant papers seems to show that these systems are not a very good fit for massive time series systems. However, if systems such as google maps are based on them, I think time series should work as well. For example, think of time as the x-axis, prices as y-axis and symbols as named locations–all of a sudden it looks like bigtable should be the ideal store for time series (if the whole earth can be stored, retrieved, zoomed and annotated, stock market data should be trivial).

Can some expert point me in the right direction or clear up any misunderstandings.

Thanks

  • 1 1 Answer
  • 0 Views
  • 0 Followers
  • 0
Share
  • Facebook
  • Report

Leave an answer
Cancel reply

You must login to add an answer.

Forgot Password?

Need An Account, Sign Up Here

1 Answer

  • Voted
  • Oldest
  • Recent
  • Random
  1. Editorial Team
    Editorial Team
    2026-05-12T19:19:29+00:00Added an answer on May 12, 2026 at 7:19 pm

    I am not an expert yet, but I’ve been playing with Cassandra for a few days now, and I have some answers for you:

    1. Don’t worry about amount of data, it’s irrelevant with systems like Cassandra, if you have $$$ for a large hardware cluster.

    Some of these systems (Cassandra, for example) claims to be able to do range queries. Would I be able to efficiently query, say, all values for MSFT, for a given day, between 11:00 am and 1:30 pm ?

    Cassandra is very useful when you know how to work with keys. It can swift through keys very quickly. So to search for MSFT between 11:00 and 1:30pm, you’d have to key your rows like this:

    MSFT-timestamp, GOOG-timestamp , ..etc
    Then you can tell Cassandra to find all keys that start with MSFT-now and end with MSFT-now+1hour.

    What if I want to search across ALL symbols for a given day, and request all symbols that have a price between $10 and $10.25 (so I’m searching the values, and want keys returned as a result)?

    I am not an expert, but so far I realized that Cassandra doesn’t’ search by values at all. So if you want to do the above, you will have to make another table dedicated just to this problem and design your schema to fit the case. But it won’t be much different from what I described above. It’s all about naming your keys and columns. Cassandra can find them very quickly!

    What if I want to get two times series, subtract one from the other, and return the two times series and their result, will I have to do his logic in my own program?

    Correct, all logic is done inside your program. This is not MySQL. This is just a storage engine. (But I am sure the next versions will offer these sort of things)

    Please remember, that I am a novice at this, if I am wrong, feel free to correct me.

    • 0
    • Reply
    • Share
      Share
      • Share on Facebook
      • Share on Twitter
      • Share on LinkedIn
      • Share on WhatsApp
      • Report

Sidebar

Related Questions

im trying to figure out what exactly these methods are doing. $:.unshift(File.join(APP_ROOT, 'lib')) I
I'm trying to figure out exactly why these tabs aren't showing up in anything
I am trying to figure out exactly how to implement a callback function which
I'm trying to figure out what exactly Dependency Properties are, but when I look
I've been trying to figure out what exactly is happening here. I'm just trying
Just out of curiosity, I'm trying to figure out which exactly is the right
I am trying to figure out implement a Compose New Message view that works
I'm trying to figure out exactly what happens when you link to a same-domain
I am currently trying to figure out exactly how the prototypical inheritance works in
I am trying to figure out what some legacy code is doing. What exactly

Explore

  • Home
  • Add group
  • Groups page
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Users
  • Help
  • SEARCH

Footer

© 2021 The Archive Base. All Rights Reserved
With Love by The Archive Base

Insert/edit link

Enter the destination URL

Or link to existing content

    No search term specified. Showing recent items. Search or use up and down arrow keys to select an item.