This question is a folo to a previous question I asked about how to best model different kind of time quantities and timeframes:
In a database, how to store event occurrence dates and timeframes for fast/elegant querying?
Given a table of events, I’d like the simplest way to model and query events that have these kinds of occurrences:
- One-time: XY Rock band has a show on Dec. 12, 2014 at the Rockhouse
- Annually: Volunteer at the soup kitchen on Thanksgiving morning
- Monthly: Free night at the MoMA every first Saturday
- Weekly: Regular business hours
I’ve been kicking around doing a schema in this form:
- Name
- Description
- start_datetime
- end_datetime
- frequency_type (string, e.g. ‘Weekly’, ‘Monthly’)
- mon (boolean)
- tues
- wed
- thu
- fri
- sat
- sun (all booleans)
- schedule (text)
- frequency_description (text)
A common usecase I foresee is that on a given Tuesday…say, 4/5/2016, I want to find everything that is happening on that Tuesday..including all businesses that are open on regular Tuesdays, anything that happens monthly on a Tuesday, and anything happening on that specific date.
So the pseudocode query would be something like:
SELECT * from events WHERE `tues`=TRUE || DATE(start_datetime) = '2016-04-05'
At the application/controller level I could apply the necessary logic to exclude all “monthly” Tuesday events that don’t happen on the first Tuesday, using a key/store in frequency_description (I’m going to ignore for discussion’s sake, the “annual” edge case in which something happens every fourth thursday of November or some such thing). It’d be nice to do that exclusion in the query but I’m not sure how to design the table to allow that and still keep a simple SELECT.
I’m also predicting that it’s not necessary to do a query in which I find all businesses open on Tuesday at 9AM…So the individual day fields can just be space-efficient booleans, with the schedule field being a date-store of my non-normalized specific information. The application will have logic to parse and format it for display.
Is this overkill? Let’s say 70% of my events will be one-time, which eliminates the need for the mon,tue,wed, etc. and the schedule and frequency_description text-key-stores…
Should I instead have two tables? One for events, and one for some kind of event_relation in which the day_fields and key-store-textfields are joined?
That seems like a more efficient use of space…on the other hand, my query would have to be a SELECT and JOIN…which may be slower.
When dealing with a magnitude of records numbering from 10k to 100k, and doing simple EC2 hosting…should I care more about efficient space usage in my database (not just pure data storage space, but all the associated overhead with text fields and numerous columns)…or should I care more about simple SELECT statements?
You could just make your recurring events insert into the ‘once of’ event table with a key referencing back to the master recurring event record (in a separate table).
While it’s not very good for space usage.. you can make some shortcuts that say that events that occur “every Tuesday from now to the end of all time”, the end time might actually default to say 200 years in the future from now, that means you’re only populating 10k records (52 * 200) in this extreme case.
This would simplify your reading greatly as you would then just be looking for any ‘event’ that occurs on that date, and then you would do all your excludes based on the master recurring event table record.
So you have something like this:
Suppose you have 1000 weekly recurring events, (and we assume you go with 200 years if no endDate) that’s going to be say 10M records, you then index the
start_datetimefield of theEvent occurrence tableand your query will be very quick even with many more records than this. Compare the costs of this (reduced performance on writes and more space used) versus having to find every event thattoday is between startdate and enddateand then calculate if the event is actually occurring on today.In the end it all comes down to: