This is more a conceptual question. It’s inspired from using some extremely large table

Question

0

Asked: May 23, 20262026-05-23T20:56:55+00:00 2026-05-23T20:56:55+00:00

This is more a conceptual question. It’s inspired from using some extremely large table

0

This is more a conceptual question. It’s inspired from using some extremely large table where even a simple query takes a long time (properly indexed). I was wondering is there is a better structure then just letting the table grow, continually.

By large I mean 10,000,000+ records that grows every day by something like 10,000/day. A table like that would hit 10,000,000 additional records every 2.7 years. Lets say that more recent records are accesses the most but the older ones need to remain available.
I have two conceptual ideas to speed it up.

1) Maintain a master table that holds all the data, indexed by date in reverse order. Create a separate view for each year that holds only the data for that year. Then when querying, and lets say the query is expected to pull only a few records from a three year span, I could use a union to combine the three views and select from those.

2) The other option would be to create a separate table for every year. Then, again using a union to combine them when querying.

Does anyone else have any other ideas or concepts? I know this is a problem Facebook has faced, so how do you think they handled it? I doubt they have a single table (status_updates) that contains 100,000,000,000 records.

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-05-23T20:56:57+00:00

The main RDBMS providers all have similar concepts in terms of partitioned tables and partitioned views (as well as combinations of the two)

There is one immediate benefit, in that the data is now split across multiple conceptual tables, so any query that includes the partition key within the query can automatically ignore any partition that the key would not be in.

From a RDBMS management perspective, having the data divided into seperate partitions allows operations to be performed at a partition level, backup / restore / indexing etc. This helps reduce downtimes as well as allow for far faster archiving by just removing an entire partition at a time.

There are also non relational storage mechanisms such as nosql, map reduce etc, but ultimately how it is used, loaded and data is archived become a driving factor in the decision of the structure to use.

10 million rows is not that large in the scale of large systems, partitioned systems can and will hold billions of rows.

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

This is more a conceptual question. It’s inspired from using some extremely large table

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply