The box this query is running on is a dedicated server running in a datacenter.
AMD Opteron 1354 Quad-Core 2.20GHz 2GB of RAM Windows Server 2008 x64 (Yes I know I only have 2GB of RAM, I’m upgrading to 8GB when the project goes live).
So I went through and created 250,000 dummy rows in a table to really stress test some queries that LINQ to SQL generates and make sure they’re not to terrible and I noticed one of them was taking an absurd amount of time.
I had this query down to 17 seconds with indexes but I removed them for the sake of this answer to go from start to finish. Only indexes are Primary Keys.
Stories table -- [ID] [int] IDENTITY(1,1) NOT NULL, [UserID] [int] NOT NULL, [CategoryID] [int] NOT NULL, [VoteCount] [int] NOT NULL, [CommentCount] [int] NOT NULL, [Title] [nvarchar](96) NOT NULL, [Description] [nvarchar](1024) NOT NULL, [CreatedAt] [datetime] NOT NULL, [UniqueName] [nvarchar](96) NOT NULL, [Url] [nvarchar](512) NOT NULL, [LastActivityAt] [datetime] NOT NULL, Categories table -- [ID] [int] IDENTITY(1,1) NOT NULL, [ShortName] [nvarchar](8) NOT NULL, [Name] [nvarchar](64) NOT NULL, Users table -- [ID] [int] IDENTITY(1,1) NOT NULL, [Username] [nvarchar](32) NOT NULL, [Password] [nvarchar](64) NOT NULL, [Email] [nvarchar](320) NOT NULL, [CreatedAt] [datetime] NOT NULL, [LastActivityAt] [datetime] NOT NULL,
Currently in the database there is 1 user, 1 category and 250,000 stories and I tried to run this query.
SELECT TOP(10) * FROM Stories INNER JOIN Categories ON Categories.ID = Stories.CategoryID INNER JOIN Users ON Users.ID = Stories.UserID ORDER BY Stories.LastActivityAt
Query takes 52 seconds to run, CPU usage hovers at 2-3%, Membery is 1.1GB, 900MB free but the Disk usage seems out of control. It’s @ 100MB/sec with 2/3 of that being writes to tempdb.mdf and the rest is reading from tempdb.mdf.
Now for the interesting part…
SELECT TOP(10) * FROM Stories INNER JOIN Categories ON Categories.ID = Stories.CategoryID INNER JOIN Users ON Users.ID = Stories.UserID SELECT TOP(10) * FROM Stories INNER JOIN Users ON Users.ID = Stories.UserID ORDER BY Stories.LastActivityAt SELECT TOP(10) * FROM Stories INNER JOIN Categories ON Categories.ID = Stories.CategoryID ORDER BY Stories.LastActivityAt
All 3 of these queries are pretty much instant.
Exec plan for first query.
http://i43.tinypic.com/xp6gi1.png
Exec plans for other 3 queries (in order).
http://i43.tinypic.com/30124bp.png
http://i44.tinypic.com/13yjml1.png
http://i43.tinypic.com/33ue7fb.png
Any help would be much appreciated.
Exec plan after adding indexes (down to 17 seconds again).
http://i39.tinypic.com/2008ytx.png
I’ve gotten a lot of helpful feedback from everyone and I thank you, I tried a new angle at this. I query the stories I need, then in separate queries get the Categories and Users and with 3 queries it only took 250ms… I don’t understand the issue but if it works and at 250ms no less for the time being I’ll stick with that. Here’s the code I used to test this.
DBDataContext db = new DBDataContext(); Console.ReadLine(); Stopwatch sw = Stopwatch.StartNew(); var stories = db.Stories.OrderBy(s => s.LastActivityAt).Take(10).ToList(); var storyIDs = stories.Select(c => c.ID); var categories = db.Categories.Where(c => storyIDs.Contains(c.ID)).ToList(); var users = db.Users.Where(u => storyIDs.Contains(u.ID)).ToList(); sw.Stop(); Console.WriteLine(sw.ElapsedMilliseconds);
Try adding an index on Stories.LastActivityAt. I think the clustered index scan in the execution plan may be due to the sorting.
Edit: Since my query returned in an instant with rows just a few bytes long, but has been running for 5 minutes already and is still going after I added a 2K varchar, I think Mitch has a point. It is the volume of that data that is shuffled around for nothing, but this can be fixed in the query.
Try putting the join, sort and top(10) in a view or in a nested query, and then join back against the story table to get the rest of the data just for the 10 rows that you need.
Like this:
If you have an index on LastActivityAt, this should run very fast.