Sign Up

Sign Up to our social questions and Answers Engine to ask questions, answer people’s questions, and connect with other people.

Have an account? Sign In

Have an account? Sign In Now

Sign In

Login to our social questions & Answers Engine to ask questions answer people’s questions & connect with other people.

Sign Up Here

Forgot Password?

Don't have account, Sign Up Here

Forgot Password

Lost your password? Please enter your email address. You will receive a link and will create a new password via email.

Have an account? Sign In Now

You must login to ask a question.

Forgot Password?

Need An Account, Sign Up Here

Please briefly explain why you feel this question should be reported.

Please briefly explain why you feel this answer should be reported.

Please briefly explain why you feel this user should be reported.

Sign InSign Up

The Archive Base

The Archive Base Logo The Archive Base Logo

The Archive Base Navigation

  • Home
  • SEARCH
  • About Us
  • Blog
  • Contact Us
Search
Ask A Question

Mobile menu

Close
Ask a Question
  • Home
  • Add group
  • Groups page
  • Feed
  • User Profile
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Buy Points
  • Users
  • Help
  • Buy Theme
  • SEARCH
Home/ Questions/Q 137589
In Process

The Archive Base Latest Questions

Editorial Team
  • 0
Editorial Team
Asked: May 11, 20262026-05-11T07:07:48+00:00 2026-05-11T07:07:48+00:00

I’m dealing with a Postgres table (called lives) that contains records with columns for

  • 0

I’m dealing with a Postgres table (called ‘lives’) that contains records with columns for time_stamp, usr_id, transaction_id, and lives_remaining. I need a query that will give me the most recent lives_remaining total for each usr_id

  1. There are multiple users (distinct usr_id’s)
  2. time_stamp is not a unique identifier: sometimes user events (one by row in the table) will occur with the same time_stamp.
  3. trans_id is unique only for very small time ranges: over time it repeats
  4. remaining_lives (for a given user) can both increase and decrease over time

example:

time_stamp|lives_remaining|usr_id|trans_id -----------------------------------------   07:00  |       1       |   1  |   1       09:00  |       4       |   2  |   2       10:00  |       2       |   3  |   3       10:00  |       1       |   2  |   4       11:00  |       4       |   1  |   5       11:00  |       3       |   1  |   6       13:00  |       3       |   3  |   1     

As I will need to access other columns of the row with the latest data for each given usr_id, I need a query that gives a result like this:

time_stamp|lives_remaining|usr_id|trans_id -----------------------------------------   11:00  |       3       |   1  |   6       10:00  |       1       |   2  |   4       13:00  |       3       |   3  |   1     

As mentioned, each usr_id can gain or lose lives, and sometimes these timestamped events occur so close together that they have the same timestamp! Therefore this query won’t work:

SELECT b.time_stamp,b.lives_remaining,b.usr_id,b.trans_id FROM        (SELECT usr_id, max(time_stamp) AS max_timestamp         FROM lives GROUP BY usr_id ORDER BY usr_id) a  JOIN lives b ON a.max_timestamp = b.time_stamp 

Instead, I need to use both time_stamp (first) and trans_id (second) to identify the correct row. I also then need to pass that information from the subquery to the main query that will provide the data for the other columns of the appropriate rows. This is the hacked up query that I’ve gotten to work:

SELECT b.time_stamp,b.lives_remaining,b.usr_id,b.trans_id FROM        (SELECT usr_id, max(time_stamp || '*' || trans_id)         AS max_timestamp_transid        FROM lives GROUP BY usr_id ORDER BY usr_id) a  JOIN lives b ON a.max_timestamp_transid = b.time_stamp || '*' || b.trans_id  ORDER BY b.usr_id 

Okay, so this works, but I don’t like it. It requires a query within a query, a self join, and it seems to me that it could be much simpler by grabbing the row that MAX found to have the largest timestamp and trans_id. The table ‘lives’ has tens of millions of rows to parse, so I’d like this query to be as fast and efficient as possible. I’m new to RDBM and Postgres in particular, so I know that I need to make effective use of the proper indexes. I’m a bit lost on how to optimize.

I found a similar discussion here. Can I perform some type of Postgres equivalent to an Oracle analytic function?

Any advice on accessing related column information used by an aggregate function (like MAX), creating indexes, and creating better queries would be much appreciated!

P.S. You can use the following to create my example case:

create TABLE lives (time_stamp timestamp, lives_remaining integer,                      usr_id integer, trans_id integer); insert into lives values ('2000-01-01 07:00', 1, 1, 1); insert into lives values ('2000-01-01 09:00', 4, 2, 2); insert into lives values ('2000-01-01 10:00', 2, 3, 3); insert into lives values ('2000-01-01 10:00', 1, 2, 4); insert into lives values ('2000-01-01 11:00', 4, 1, 5); insert into lives values ('2000-01-01 11:00', 3, 1, 6); insert into lives values ('2000-01-01 13:00', 3, 3, 1); 
  • 1 1 Answer
  • 0 Views
  • 0 Followers
  • 0
Share
  • Facebook
  • Report

Leave an answer
Cancel reply

You must login to add an answer.

Forgot Password?

Need An Account, Sign Up Here

1 Answer

  • Voted
  • Oldest
  • Recent
  • Random
  1. 2026-05-11T07:07:48+00:00Added an answer on May 11, 2026 at 7:07 am

    On a table with 158k pseudo-random rows (usr_id uniformly distributed between 0 and 10k, trans_id uniformly distributed between 0 and 30),

    By query cost, below, I am referring to Postgres’ cost based optimizer’s cost estimate (with Postgres’ default xxx_cost values), which is a weighed function estimate of required I/O and CPU resources; you can obtain this by firing up PgAdminIII and running ‘Query/Explain (F7)’ on the query with ‘Query/Explain options’ set to ‘Analyze’

    • Quassnoy’s query has a cost estimate of 745k (!), and completes in 1.3 seconds (given a compound index on (usr_id, trans_id, time_stamp))
    • Bill’s query has a cost estimate of 93k, and completes in 2.9 seconds (given a compound index on (usr_id, trans_id))
    • Query #1 below has a cost estimate of 16k, and completes in 800ms (given a compound index on (usr_id, trans_id, time_stamp))
    • Query #2 below has a cost estimate of 14k, and completes in 800ms (given a compound function index on (usr_id, EXTRACT(EPOCH FROM time_stamp), trans_id))
      • this is Postgres-specific
    • Query #3 below (Postgres 8.4+) has a cost estimate and completion time comparable to (or better than) query #2 (given a compound index on (usr_id, time_stamp, trans_id)); it has the advantage of scanning the lives table only once and, should you temporarily increase (if needed) work_mem to accommodate the sort in memory, it will be by far the fastest of all queries.

    All times above include retrieval of the full 10k rows result-set.

    Your goal is minimal cost estimate and minimal query execution time, with an emphasis on estimated cost. Query execution can dependent significantly on runtime conditions (e.g. whether relevant rows are already fully cached in memory or not), whereas the cost estimate is not. On the other hand, keep in mind that cost estimate is exactly that, an estimate.

    The best query execution time is obtained when running on a dedicated database without load (e.g. playing with pgAdminIII on a development PC.) Query time will vary in production based on actual machine load/data access spread. When one query appears slightly faster (<20%) than the other but has a much higher cost, it will generally be wiser to choose the one with higher execution time but lower cost.

    When you expect that there will be no competition for memory on your production machine at the time the query is run (e.g. the RDBMS cache and filesystem cache won’t be thrashed by concurrent queries and/or filesystem activity) then the query time you obtained in standalone (e.g. pgAdminIII on a development PC) mode will be representative. If there is contention on the production system, query time will degrade proportionally to the estimated cost ratio, as the query with the lower cost does not rely as much on cache whereas the query with higher cost will revisit the same data over and over (triggering additional I/O in the absence of a stable cache), e.g.:

                  cost | time (dedicated machine) |     time (under load) | -------------------+--------------------------+-----------------------+ some query A:   5k | (all data cached)  900ms | (less i/o)     1000ms | some query B:  50k | (all data cached)  900ms | (lots of i/o) 10000ms | 

    Do not forget to run ANALYZE lives once after creating the necessary indices.


    Query #1

    -- incrementally narrow down the result set via inner joins --  the CBO may elect to perform one full index scan combined --  with cascading index lookups, or as hash aggregates terminated --  by one nested index lookup into lives - on my machine --  the latter query plan was selected given my memory settings and --  histogram SELECT   l1.*  FROM   lives AS l1  INNER JOIN (     SELECT       usr_id,       MAX(time_stamp) AS time_stamp_max      FROM       lives      GROUP BY       usr_id   ) AS l2  ON   l1.usr_id     = l2.usr_id AND   l1.time_stamp = l2.time_stamp_max  INNER JOIN (     SELECT       usr_id,       time_stamp,       MAX(trans_id) AS trans_max      FROM       lives      GROUP BY       usr_id, time_stamp   ) AS l3  ON   l1.usr_id     = l3.usr_id AND   l1.time_stamp = l3.time_stamp AND   l1.trans_id   = l3.trans_max 

    Query #2

    -- cheat to obtain a max of the (time_stamp, trans_id) tuple in one pass -- this results in a single table scan and one nested index lookup into lives, --  by far the least I/O intensive operation even in case of great scarcity --  of memory (least reliant on cache for the best performance) SELECT   l1.*  FROM   lives AS l1  INNER JOIN (    SELECT      usr_id,      MAX(ARRAY[EXTRACT(EPOCH FROM time_stamp),trans_id])        AS compound_time_stamp     FROM      lives     GROUP BY      usr_id   ) AS l2 ON   l1.usr_id = l2.usr_id AND   EXTRACT(EPOCH FROM l1.time_stamp) = l2.compound_time_stamp[1] AND   l1.trans_id = l2.compound_time_stamp[2] 

    2013/01/29 update

    Finally, as of version 8.4, Postgres supports Window Function meaning you can write something as simple and efficient as:

    Query #3

    -- use Window Functions -- performs a SINGLE scan of the table SELECT DISTINCT ON (usr_id)   last_value(time_stamp) OVER wnd,   last_value(lives_remaining) OVER wnd,   usr_id,   last_value(trans_id) OVER wnd  FROM lives  WINDOW wnd AS (    PARTITION BY usr_id ORDER BY time_stamp, trans_id    ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING  ); 
    • 0
    • Reply
    • Share
      Share
      • Share on Facebook
      • Share on Twitter
      • Share on LinkedIn
      • Share on WhatsApp
      • Report

Sidebar

Ask A Question

Stats

  • Questions 78k
  • Answers 78k
  • Best Answers 0
  • User 1
  • Popular
  • Answers
  • Editorial Team

    How to approach applying for a job at a company ...

    • 7 Answers
  • Editorial Team

    How to handle personal stress caused by utterly incompetent and ...

    • 5 Answers
  • Editorial Team

    What is a programmer’s life like?

    • 5 Answers
  • added an answer You could use git add -p <path> to stage the… May 11, 2026 at 3:52 pm
  • added an answer IIS authentication happens before the request is passed to the… May 11, 2026 at 3:52 pm
  • added an answer Only the sourcecode is compiled into the assembly, you still… May 11, 2026 at 3:52 pm

Related Questions

I ran into a problem. Wrote the following code snippet: teksti = teksti.Trim() teksti
I am currently running into a problem where an element is coming back from
Seemingly simple, but I cannot find anything relevant on the web. What is the
Configuring TinyMCE to allow for tags, based on a customer requirement. My config is
Is it possible to replace javascript w/ HTML if JavaScript is not enabled on

Trending Tags

analytics british company computer developers django employee employer english facebook french google interview javascript language life php programmer programs salary

Top Members

Explore

  • Home
  • Add group
  • Groups page
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Users
  • Help
  • SEARCH

Footer

© 2021 The Archive Base. All Rights Reserved
With Love by The Archive Base

Insert/edit link

Enter the destination URL

Or link to existing content

    No search term specified. Showing recent items. Search or use up and down arrow keys to select an item.