I’ve got a Postgres 9.1 database that contains weather information. The dataset consists of approximately 3.1 million rows.
It takes about 2 minutes to load the data from a CSV file, and a little less to create a multicolumn index.
Every 6 hours I need to completely refresh the dataset. My current thinking is I would import the new dataset into a different database name, such as “weather_imported” and once the import and index creation are finished, I would drop the original database and rename the imported database.
In theory, clients would continue to query the database during this operation, though if that has ill effects, I could probably arrange to have the clients silently ignore a few errors.
Questions:
-
Will that strategy work?
-
If a client happened to be in the
process of running a query at the time of DB drop, my assumption is
the database would not complete the drop until the query were
finished – true? -
What if a query happened between the time the
DB were dropped and the rename? I assume a “database not found”
error. -
Is there a better strategy?
Consider the following strategy as an alternative:
Presto — no need to shuffle databases around.