I’m trying to select only one row from each portal (the last one by date) but I’m getting trouble with group by/distinct
Using this code, I can select only the portalId that I need, but without any data
Select relNews.PortalId
from news
left join relNews on relNews.NewsId= news.NewsId
group by relNews.PortalId
When I add one or more data columns like in this code, the select brings all the info, not only one for each portal
Select relNews.PortalId, news.NewsId
from news
left join relNews on relNews.NewsId= news.NewsId
group by relNews.PortalId, news.NewsId
I know that is a small trick that I’m missing here, but I just can’t remember what…
UPDATE
Lets make virtual tables for this example. The tables are news and relNews (I made them as short as possible here)
Table news
- NewsId
- Title
- Description
- Date
Table relNews
- RelNewsId
- NewsId
- PortalId
NOTE:
- relNews can have
Nregisters of the same NewsId - I need to select the last register for each portalId (based on news.Date).
Lets say:
Table news
- NewsId == 1
- Title == ‘test’
- Description == ‘test’
-
Date == ‘2013-01-01 00:00:00’
-
NewsId == 2
- Title == ‘test2’
- Description == ‘test2’
-
Date == ‘2013-01-01 03:00:00’
-
NewsId == 3
- Title == ‘test3’
- Description == ‘test3’
- Date == ‘2013-01-02 00:00:00’
Table relNews
- RelNewsId == 1
- NewsId == 1
-
PortalId == 1
-
RelNewsId == 2
- NewsId == 1
-
PortalId == 2
-
RelNewsId == 3
- NewsId == 2
-
PortalId == 1
-
RelNewsId == 4
- NewsId == 3
- PortalId == 3
This data should bring:
RelNewsId == 2; RelNewsId == 3; RelNewsId == 4;
I can get the result that I want with this code:
Select top 1 relNews.PortalId, news.NewsId, news.date
from news
left join relNews on relNews.NewsId= news.NewsId
where relNews.PortalId == 1
group by relNews.PortalId, news.NewsId
order by news.date desc
UNION
Select top 1 relNews.PortalId, news.NewsId, news.date
from news
left join relNews on relNews.NewsId= news.NewsId
where relNews.PortalId == 2
group by relNews.PortalId, news.NewsId
order by news.date desc
UNION
Select top 1 relNews.PortalId, news.NewsId, news.date
from news
left join relNews on relNews.NewsId= news.NewsId
where relNews.PortalId == 3
group by relNews.PortalId, news.NewsId
order by news.date desc
Then I get all 3 results.
You have to provide some way of indicating which one of the
newsrows you want, when there could be multiple for onerelNewsrow. As you’ve discovered, the moment youGROUP BYa column that can have many values for each parent, then you get multiple rows. There are several ways to do this. Since you are using SQL Server 2008 you have several options.CROSS/OUTER APPLY – change the
OUTER APPLYtoCROSS APPLYif you want to excluderelNewsrows when there is no matchingNewsrow.Row_Number()
Aggregate – this is more complicated than it may seem to require, but I assumed that
newsDateis NOT unique pernewsID. If it is unique then it’s simpler. This version works in SQL 2000. This is also probably the worst-performing query of all the options I am providing. Note thatUniqueColumnis any column that has guaranteed unique values pernewsIDand can be used to select among ties fornewsDate.If
newsDateis truly unique pernews.Idthen here is that query:Subquery – with the problem that you can only pull one column at a time but works in SQL 2000, and should perform well with proper indexes. For multiple columns may perform badly as it may do a separate query for each one.
Subquery in the ON clause – probably the best query for SQL 2000 if you’re going to pull multiple columns. Has to hit the News table twice, but with proper indexes it shouldn’t be so bad. Note that
UniqueColumnis any column that has guaranteed unique values pernewsIDand can be used to select among ties fornewsDate.Logical-Last – another possibly good performer for SQL 2000. This is the same logical query as fo_x86 but expressed a little differently.
Option #1 is probably the best for you in the database version you’re using.
One last note: As always, testing is required. The performance of all these different queries will depend on a lot of factors: the pattern of your data (many dates for each news item or few), the exact indexes, how wide the tables are, and whether you add extra conditions (for example, the row number one as I suggested will not do so well if your conditions on the outer
relNewstable return only a few rows). If you find that one query is not providing satisfactory execution times, try a different one.