I am using an ADO .Net Entity Model for querying a MySQL database. I was very happy about its implementation and usage. I decided to see what would happen if I queried 1 million records and it has serious performance issues, and I don’t understand why.
The system hangs for sometime and then I get either
- A deadlock exception
- MySQL Exception
My code is as follows::
try
{
// works very fast
var data = from employees in dataContext.employee_table
.Include("employee_type")
.Include("employee_status")
orderby employees.EMPLOYEE_ID descending
select employees;
// This hangs the system and causes some deadlock exception
IList<employee_table> result = data.ToList<employee_table>();
return result;
}
catch (Exception ex)
{
throw new MyException("Error in fetching all employees", ex);
}
My question is why is ToList() taking such a long time?
Also how can I avoid this exception and what is the ideal way to query a million records?
The ideal way to query a million records would be to use a
IQueryable<T>to make sure that you actually aren’t executing a query on the database until you need the actual data. I highly doubt that you need a million records at once.The reason that it is deadlocking is that you are asking the MySQL server to pull those million records from the database then sort then by the
EMPLOYEE_IDand then for your program to return that back to you. So I imagine that the deadlocks are from your program waiting for that to finish, and for your program to read that into memory. The MySQL problems are probably related to timeout issues.The reason that the
var datasection works quickly is because you actually haven’t done anything yet, you’ve just constructed the query. when you callToList()then all of the SQL and reading of the SQL is executed. This is what is known as Lazy Loading.I would suggest try this as follows:
Then when you actually need something from the list just call
So if you needed the employee with ID 10.
Or if you need all the employees that last names start with S (I don’t know if your table has a last name column, just an example).
Or if you want to page through all of the employees in chunks of 100
If you want to defer your database calls even further, you can not call
ToList()and just use the iterator when you need it. So let’s say you want to add up all of the salaries of the people that have a name starting with AThis would only do a query that would look something like
Resulting in a very fast query that is only getting the information when you need it and only just the information that you need.
If for some crazy reason you wanted to actually query all the million records for the database. Ignoring the fact that this would eat up a massive amount of system resources I would suggest doing this in chunks, you would probably need to play around with the chunk size to get the best performance.
The general idea is to do smaller queries to avoid timeout issues from the database.
I chose to use a
HashSet<T>since they are designed for large sets of data, but I don’t know what performance would look like a 1,000,000 objects.