I want to record user states and then be able to report historically based on the record of changes we’ve kept. I’m trying to do this in SQL (using PostgreSQL) and I have a proposed structure for recording user changes like the following.
CREATE TABLE users ( userid SERIAL NOT NULL PRIMARY KEY, name VARCHAR(40), status CHAR NOT NULL ); CREATE TABLE status_log ( logid SERIAL, userid INTEGER NOT NULL REFERENCES users(userid), status CHAR NOT NULL, logcreated TIMESTAMP );
That’s my proposed table structure, based on the data.
For the status field ‘a’ represents an active user and ‘s’ represents a suspended user,
INSERT INTO status_log (userid, status, logcreated) VALUES (1, 's', '2008-01-01'); INSERT INTO status_log (userid, status, logcreated) VALUES (1, 'a', '2008-02-01');
So this user was suspended on 1st Jan and active again on 1st of February.
If I wanted to get a suspended list of customers on 15th January 2008, then userid 1 should show up. If I get a suspended list of customers on 15th February 2008, then userid 1 should not show up.
1) Is this the best way to structure this data for this kind of query?
2) How do I query the data in either this structure or in your proposed modified structure so that I can simply have a date (say 15th January) and find a list of customers that had an active status on that date in SQL only? Is this a job for SQL?
This can be done, but would be a lot more efficient if you stored the end date of each log. With your model you have to do something like:
With the additional column it woud be more like:
(Apologies for any syntax errors, I don’t know Postgresql.)
To address some further issues raised by Phil:
This would appear in the table like this:
I used a null for the ‘to’ date of the current record. I could have used a future date like 2999-12-31 but null is preferable in some ways.
Yes, my query would have to be re-written as
A downside of this design is that whenever the user’s status changes you have to end date their current status_log as well as create a new one. However, that isn’t difficult, and I think the query advantage probably outweighs this.