I have to generate about a million random trips between about 40K destinations. Each destination has it’s own weight (total_probability), the more it is, the more trips should start or end in this place.
Either the trips should be generated randomly, but destinations (start and end points) should be weighted by probability, or it’s possible to just pre-calculate an exact number of trips (divide each weight by the sum of weights, multiply by 1M and round to integers).
Problem is how to make it in PostgreSQL without generating the 40K*40K table with all destinations pairs.
Table "public.dests"
Column | Type | Modifiers
-------------------+------------------+-----------
id | integer |
total_probability | double precision |
Table "public.trips"
Column | Type | Modifiers
------------+------------------+-----------
from_id | integer |
to_id | integer |
trips_num | integer |
...
some other metrics...
primary key for trips is (from_id, to_id)
Should I generate a table with 1M records and then update it iteratively, or a for loop with 1M inserts will be fast enough? I work on a 2-core lightweight laptop.
P.S. I gave up and did this in Python. To perform a set of queries and the transformation in Python, I’ll run SQL scripts from Python rather than from a shell script. Thanks for suggestions!
In 9.1, you can use
TRIGGERs onVIEWs, which effectively let you create materialized views (albeit manually). I think your first run may be expensive, but using a loop is probably the way to go, but then after that, I’d use a series ofTRIGGERs to maintain the data in a table.At the end of the day you need to decide whether or not you want to calculate the results for every query, or you memoize the result via a materialized view.