I’m struggling with a philosophical question on database programming in PHP. In particular, I’m trying to decide when it’s best to read in an entire table into an object, vs. querying MySQL directly whenever I need data.
Is there ever a situation where you’d want to just read in the entire database into an object? Where do you draw the line?
For example, if I had a table full of names and phone numbers, and I need to get the phone number for one individual, that’s a simple one-time mysql query. Reading in an entire table into an associative array just to get one phone number sounds ridiculous… But:
(1) what if I need to get the names and phone numbers of 50 individuals? 100? 1000?
(2) When is it more efficient (if ever) to read in the entire table into an object? Is performing 1000 mysql queries on 1000 names always going to be more efficient than reading in the entire table?
(2a) Obviously it would depend on the total number of records in the table. Would it be better to do 1000 queries for 1000 phone numbers, or read in a table of 2000 total records from a MySQL into an associative array? What if it was 5000 total records, and I needed 1000? What if it was 10k? Etc. etc.
(3) What if I need to do something a little more complex, like return all phone numbers in a certain area code? Obviously in that case I could use a regexp SQL query, but I’m sure I could come up with a more complex case where a simple query doesn’t give me exactly what I want.
I guess what I’m getting at is, as a developer, you have several knobs you can turn to optimize your application. Obviously you want to think about the data you’re using and optimize the database model to match the types of data requests you’ll be doing. But sometimes you get into a mutually exclusive case where you’re forced to pick optimizing your data model for one scenario, at the expense of another, competing scenario.
Any thoughts?
Databases are designed to be efficient at locating and returning exactly the data that you need to work with for a particular operation.
Transferring data over a network connection is orders of magnitude slower than processing it on the machine where it resides. Use databases for what they’re good at… holding lots of information and allowing application code to query and work with exactly the subset of that data it needs to at a given point in time.
If you find that you need to frequently access the same data over and over, caching it at the application layer or in a dedicated caching solution like memcached does make sense, but I cannot imagine a scenario where it makes sense just to read in a whole table because my application logic needs to process a subset of the rows and/or columns in the table.