I am creating a table that stores data, actually counter, for products for each week.
Example:
id = 1
productId = 195
DateTime = 01/07/2012
Counter = 0
My question to you is about database storage space, query flexibility and performance.
Instead of the DateTime column, I thought about using a SmallInt ‘WeekNumber’ column.
I will decide on the Date that the weeks start (base date). Let’s say 10/10/2012.
For each product and for each week, there will be a row that represents the total of something that I count on a daily basis (ie. Pageviews for a specific product page).
From What I’v eread:
Date column is 4 bytes
SmallInt is 2 bytes
I want to save as much space as possible, but I want to be able to query the database base on range of dates (august 2012 to September 2013), specific week in a specific year, etc.
Is this approach to the schema is good, or I will find myself having problem with poor SQL performance, Query flexibility, indexes, etc.
If this table has no child tables (no foreign keys referencing it), to conserve space, you might consider omitting the surrogate primary key
(id), and instead use a composite key(productId,date_)as the primary key. (From what you describe, it sounds as you are going to want to have the combination of those columns as UNIQUE, and both of those columns as NOT NULL.If what you want to store is a “week” identifier rather than a DATE, there’s no problem on the database side of things, as long as your queries aren’t wrapping that column in an expression to get a DATE values to use in predicates. That is, for performance, your predicates are going to need to be on the bare “week identifier” column, e.g.
Predicates like that on the bare column will be sargable (that is, allow for an index to be used.) You do NOT want to be wrapping that
week_idcolumn in an expression to return a DATE, and use WHERE clause on that expression. (Having expressions on the literal side of the comparison is not a problem… you just don’t want them on the “table” side.That’s really going to be the determining factor of whether you can use a
week_idin place of a DATE column.Using a “period id” in place of a DATE is fairly straightforward to implement for periods that are whole months. (It’s also straightforward for “days”, but is really of less benefit there.) Implementing this approach for “week” periods is more complicated, because of the handling you need for a week that is split between two years.
Consider, for example, that the last two days of this year (2012) are on Sunday and Monday, but Tuesday thru Saturday of that same week are in 2013. You’d need to decide whether that’s two separate weeks, or whether that’s the same week.
But the 1-byte savings (of SMALLINT vs DATE) isn’t the real benefit. What the “week_id” column gets you (as I see it) is that you have a single id value that identifies a week. Consider the date values of
'2012-07-30','2012-07-31','2012-08-01'they all really represent the same week. So you have multiple values for the week, such that a UNIQUE constraint on(product_id,date)doesn’t really GUARANTEE (on the database side) that you don’t have more than row for the same week. (That’s not an insurmountable problem of course, you can specify that you only store a Sunday (or Monday) date value.)In summary,
To conserve space, I would first drop that surrogate
idcolumn, and make the combination of the product_id and the DATE be the primary key.Then I would ONLY consider changing that DATE into a SMALLINT, if I could GUARANTEE that all queries would be referencing that bare SMALLINT column, and NOT referencing an expression that converts the SMALLINT column back into DATE.