I’ve got a large database, quite rapidly expanding and I’ve got a number of busy tables, logging every aspect of user’s behaviour.
At the moment, I have a studio where users can see this usage and behaviour obviously displayed in charts, etc. etc. The thing is, it’s seriously intensive to load this stuff now. Had a project that had usage of 80,000 people and it takes an age to load the stats.
Now, the tables are quite well structured and indexed on joins etc. I’ve had advice and sought learning along the way for best practice to try and help best prepare for this data size. But, without much more scope in query/table optimisation how else can I speed up this intensive process?.
I notice most analytics and such allow you to view up until yesterday by default. Does that help?
- Does this mean the statistics can be cached by query_cache on mysql? If the query constantly ends tomorrow (thereby counting today’s stats), will it not cache?
- Is it more sensible to compile static XMLs etc. each hour that can be referenced, instead of doing queries each time?
- How else?
Any thoughts very much welcome.
You’d want to split things up into two databases. One optimized for insertion to capture the data. And a second one optimized for data retrieval. You can’t do this with one single database handling both tasks. Optimizing for heavy data insertion means reducing to absolute bare mininum the amount of indexing done (basically just primary keys), and removing keys kills performance when it comes time to do the data mining.
So… two databases. Capture all the data into the insert-optimized one. Then have a scheduled job slurp over the day’s data capture into the other database, and run your analyses there.
As a side effect, this where the “up until yesterday” restriction comes from. Today’s data won’t be available as it’s in a separate database.