I am trying to write a query in Postgresql that pulls a set of ordered data and filters it by a distinct field. I also need to pull several other fields from the same table row, but they need to be left out of the distinct evaluation. example:
SELECT DISTINCT(user_id) user_id,
created_at
FROM creations
ORDER BY created_at
LIMIT 20
I need the user_id to be DISTINCT, but don’t care whether the created_at date is unique or not. Because the created_at date is being included in the evaluation, I am getting duplicate user_id in my result set.
Also, the data must be ordered by the date, so using DISTINCT ON is not an option here. It required that the DISTINCT ON field be the first field in the ORDER BY clause and that does not deliver the results that I seek.
How do I properly use the DISTINCT clause but limit its scope to only one field while still selecting other fields?
As you’ve discovered, standard SQL treats
DISTINCTas applying to the whole select-list, not just one column or a few columns. The reason for this is that it’s ambiguous what value to put in the columns you exclude from theDISTINCT. For the same reason, standard SQL doesn’t allow you to have ambiguous columns in a query withGROUP BY.But PostgreSQL has a nonstandard extension to SQL to allow for what you’re asking:
DISTINCT ON (expr).You have to include the distinct expression(s) as the leftmost part of your ORDER BY clause.
See the manual on DISTINCT Clause for more information.