I am having a database with tables having billions of rows in a single table for a month and I am having data for the past 5 years. I tried to optimize the data in all possible ways, but the latency is not decreasing. I know there are some solutions like using horizantal shrading and vertical shrading. But I am not sure about any open source implementations and the development time required to make the switch. Does anyone have any experience with using such systems?
Thank you.
nobody can suggest anything without a use case. When you have data that’s “Sagan-esque” in magnitude, the use case is all important, since, as you’ve likely discovered, there simply isn’t any “general” technique that works. The numbers are simply too large.
So, you need to be clear about what you want to do with this data. If the answer is “everything” then, you get slow performance, because you can’t optimize “everything”.
Edit:
Well, which is it? 2 or 3? How big are the result sets? Do you need access to all 5 years or just the last month? Do you really need all that detail, or can it be summarized? Do you need to sort it? Are the keys enough? How often is the data updated? How fast does the data need to be online once it is updated? What kind of service level does the data need to have? 24x7x7 ? 9-5×5? Day old data is OK? Who’s using the data? interactive users? Batch reports? Exports to outside entities?