I would like to define a best practice for storing timestamps in my Postgres database in the context of a multi-timezone project.
I can
- choose
TIMESTAMP WITHOUT TIME ZONEand remember which timezone was used at insertion time for this field - choose
TIMESTAMP WITHOUT TIME ZONEand add another field which will contain the name of the timezone that was used at insertion time - choose
TIMESTAMP WITH TIME ZONEand insert the timestamps accordingly
I have a slight preference for option 3 (timestamp with time zone) but would like to have an educated opinion on the matter.
First off, PostgreSQL’s time handling and arithmetic is fantastic and Option 3 is fine in the general case. It is, however, an incomplete view of time and timezones and can be supplemented:
America/Los_Angeles, not-0700).-0700).UTCand stored using aTIMESTAMP WITH TIME ZONEcolumn.UTCtoAmerica/Los_Angeles).timezonetoUTC.This option doesn’t always work because it can be hard to get a user’s time zone and hence the hedge advice to use
TIMESTAMP WITH TIME ZONEfor lightweight applications. That said, let me explain some background aspects of this this Option 4 in more detail.Like Option 3, the reason for the
WITH TIME ZONEis because the time at which something happened is an absolute moment in time.WITHOUT TIME ZONEyields a relative time zone. Don’t ever, ever, ever mix absolute and relative TIMESTAMPs.From a programmatic and consistency perspective, ensure all calculations are made using UTC as the time zone. This isn’t a PostgreSQL requirement, but it helps when integrating with other programming languages or environments. Setting a
CHECKon the column to make sure the write to the time stamp column has a time zone offset of0is a defensive position that prevents a few classes of bugs (e.g. a script dumps data to a file and something else sorts the time data using a lexical sort). Again, PostgreSQL doesn’t need this to do date calculations correctly or to convert between time zones (i.e. PostgreSQL is very adept at converting times between any two arbitrary time zones). To ensure data going in to the database is stored with an offset of zero:It’s not 100% perfect, but it provides a strong enough anti-footshooting measure that makes sure the data is already converted to UTC. There are lots of opinions on how to do this, but this seems to be the best in practice from my experience.
Criticisms of database time zone handling is largely justified (there are plenty of databases that handle this with great incompetence), however PostgreSQL’s handling of timestamps and timezones is pretty awesome (despite a few “features” here and there). For example, one such feature:
Note that
AT TIME ZONE 'UTC'strips time zone info and creates a relativeTIMESTAMP WITHOUT TIME ZONEusing your target’s frame of reference (UTC).When converting from an incomplete
TIMESTAMP WITHOUT TIME ZONEto aTIMESTAMP WITH TIME ZONE, the missing time zone is inherited from your connection:The bottom line:
America/Los_Angeles) and not an offset from UTC (e.g.-0700)UTCas thetimezonein the database if possibleRandom programming language note: Python’s
datetimedata type is very good at maintaining the distinction between absolute vs relative times (albeit frustrating at first until you supplement it with a library like PyTZ).EDIT
Let me explain the difference between relative vs absolute a bit more.
Absolute time is used to record an event. Examples: “User 123 logged in” or “a graduation ceremonies start at 2011-05-28 2pm PST.” Regardless of your local time zone, if you could teleport to where the event occurred, you could witness the event happening. Most time data in a database is absolute (and therefore should be
TIMESTAMP WITH TIME ZONE, ideally with a +0 offset and a textual label representing the rules governing the particular timezone – not an offset).A relative event would be to record or schedule the time of something from the perspective of a yet-to-be-determined time zone. Examples: “our business’s doors open at 8am and close at 9pm”, “let’s meet every Monday at 7am for a weekly breakfast meeting,” or “every Halloween at 8pm.” In general, relative time is used in a template or factory for events, and absolute time is used for almost everything else. There is one rare exception that’s worth pointing out which should illustrate the value of relative times. For future events that are far enough in the future where there could be uncertainty about the absolute time at which something could occur, use a relative timestamp. Here’s a real world example:
Suppose it’s the year 2004 and you need to schedule a delivery on October 31st in 2008 at 1pm on the West Coast of the US (i.e.
America/Los_Angeles/PST8PDT). If you stored that using absolute time using’2008-10-31 21:00:00.000000+00’::TIMESTAMP WITH TIME ZONE, the delivery would have shown up at 2pm because the US Government passed the Energy Policy Act of 2005 that changed the rules governing daylight savings time. In 2004 when the delivery was scheduled, the date10-31-2008would have been Pacific Standard Time (+8000), but starting in year 2005+ timezone databases recognized that10-31-2008would have been Pacific Daylight Savings time (+0700). Storing a relative timestamp with the time zone would have resulted in a correct delivery schedule because a relative timestamp is immune to Congress’ ill-informed tampering. Where the cutoff between using relative vs absolute times for scheduling things is, is a fuzzy line, but my rule of thumb is that scheduling for anything in the future further than 3-6mo should make use of relative timestamps (scheduled = absolute vs planned = relative ???).The other/last type of relative time is the
INTERVAL. Example: “the session will time out 20 minutes after a user logs in”. AnINTERVALcan be used correctly with either absolute timestamps (TIMESTAMP WITH TIME ZONE) or relative timestamps (TIMESTAMP WITHOUT TIME ZONE). It is equally correct to say, “a user session expires 20min after a successful login (login_utc + session_duration)” or “our morning breakfast meeting can only last 60 minutes (recurring_start_time + meeting_length)”.Last bits of confusion:
DATE,TIME,TIME WITHOUT TIME ZONEandTIME WITH TIME ZONEare all relative data types. For example:'2011-05-28'::DATErepresents a relative date since you have no time zone information which could be used to identify midnight. Similarly,'23:23:59'::TIMEis relative because you don’t know either the time zone or theDATErepresented by the time. Even with'23:59:59-07'::TIME WITH TIME ZONE, you don’t know what theDATEwould be. And lastly,DATEwith a time zone is not in fact aDATE, it is aTIMESTAMP WITH TIME ZONE:Putting dates and time zones in databases is a good thing, but it is easy to get subtly incorrect results. Minimal additional effort is required to store time information correctly and completely, however that doesn’t mean the extra effort is always required.