I have doubt in retrieving multiple records from DB.
Case 1:
- Apply join, select all records(selected columns) from DB, and doing filter condition in Web server.
Case 2:
- Apply join, apply filter condition, do calculations in DB, and do nothing in Web just display data.
In the above two cases, which one is advisable?
What I am thinking is,
- In Case 1 there is no much work for DB, but network carries much data, and Web server has to do more work.
- In Case 2, there is much work for DB, but network carries less records than Case 1, and less work for Web server.
Web server is also scalable and the DB Server is also scalable(Say my DB size may not be more than 50GB).
So where should I do filtering, and calculations to improve performance, and why?
This is a fairly abstact question – it would help to get a more concrete example.
However, if by filtering you mean removing unwanted items from the result set, that’s a case that’s ideally suited for a “where” clause in SQL.
The database is MUCH faster at doing this than your web server, because it takes advantage of indexing. As the database is also streaming less data to your web servers, the overall performance of your database is likely to improve, because at high traffic volumes, I/O becomes a significant bottleneck; if you’re streaming large amounts of data to the web servers, that’s going to affect your I/O performance.
There are cases where this is hard to do – notably where the filters change dynamically. Constructing valid SQL where clauses in your web server logic is a little tricky – if you want to offer users to construct filters on the fly, for instance. Even then, I’d still recommend doing the filtering in SQL.