I consider myself fairly competent in understanding and manipulating C-ish languages; it’s not a problem for me to come up with an algorithm and implement it in any C-ish language.
I have tremendous difficulty writing SQL (in my specific case, MySQL) queries. For very simple queries, it isn’t a problem, but for complex queries, I become frustrated not knowing where to start. Reading the MySQL documentation is difficult, mainly because the syntax description and explanation isn’t organized very well.
For example, the SELECT documentation is all over the map: it starts out with what looks like psuedo-BNF, but then (since the text for aggregate descriptions aren’t clickable… like select_expr) it quickly devolves into this frustrating exercise of trying to piece the syntax together yourself by having a number of browser windows open.
Enough whining.
I’d like to know how people, step by step, begin constructing a complex MySQL query. Here is a specific example. I have three tables below. I want to SELECT a set of rows with the following characteristics:
From the userInfo and userProgram tables, I want to select the userName, isApproved, and modifiedTimestamp fields and UNION them into one set. From this set I want to ORDER by modifiedTimestamp taking the MAX(modifiedTimestamp) for every user (i.e. there should be only one row with a unique userName and the timestamp associated with that username should be as high as possible).
From the user table, I want to match the firstName and lastName that is associated with the userName so that it looks something like this:
+-----------+----------+----------+-------------------+
| firstName | lastName | userName | modifiedTimestamp |
+-----------+----------+----------+-------------------+
| JJ | Prof | jjprofUs | 1289914725 |
| User | 2 | user2 | 1289914722 |
| User | 1 | user1 | 1289914716 |
| User | 3 | user3 | 1289914713 |
| User | 4 | user4 | 1289914712 |
| User | 5 | user5 | 1289914711 |
+-----------+----------+----------+-------------------+
The closest I’ve got is a query that looks like this:
(SELECT firstName, lastName, user.userName, modifiedTimestamp
FROM user, userInfo
WHERE user.userName=userInfo.userName)
UNION
(SELECT firstName, lastName, user.userName, modifiedTimestamp
FROM user, userProgram
WHERE user.userName=userProgram.userName)
ORDER BY modifiedTimestamp DESC;
I feel like I’m pretty close but I don’t know where to go from here or even if I’m thinking about this in the right way.
> user
+--------------------+--------------+------+-----+---------+-------+
| Field | Type | Null | Key | Default | Extra |
+--------------------+--------------+------+-----+---------+-------+
| userName | char(8) | NO | PRI | NULL | |
| firstName | varchar(255) | NO | | NULL | |
| lastName | varchar(255) | NO | | NULL | |
| email | varchar(255) | NO | UNI | NULL | |
| avatar | varchar(255) | YES | | '' | |
| password | varchar(255) | NO | | NULL | |
| passwordHint | text | YES | | NULL | |
| access | int(11) | NO | | 1 | |
| lastLoginTimestamp | int(11) | NO | | -1 | |
| isActive | tinyint(4) | NO | | 1 | |
+--------------------+--------------+------+-----+---------+-------+
> userInfo
+-------------------+------------+------+-----+---------+-------+
| Field | Type | Null | Key | Default | Extra |
+-------------------+------------+------+-----+---------+-------+
| userName | char(8) | NO | MUL | NULL | |
| isApproved | tinyint(4) | NO | | 0 | |
| modifiedTimestamp | int(11) | NO | | NULL | |
| field | char(255) | YES | | NULL | |
| value | text | YES | | NULL | |
+-------------------+------------+------+-----+---------+-------+
> userProgram
+-------------------+--------------+------+-----+---------+-------+
| Field | Type | Null | Key | Default | Extra |
+-------------------+--------------+------+-----+---------+-------+
| userName | char(8) | NO | PRI | NULL | |
| isApproved | tinyint(4) | NO | PRI | 0 | |
| modifiedTimestamp | int(11) | NO | | NULL | |
| name | varchar(255) | YES | | NULL | |
| address1 | varchar(255) | YES | | NULL | |
| address2 | varchar(255) | YES | | NULL | |
| city | varchar(50) | YES | | NULL | |
| state | char(2) | YES | MUL | NULL | |
| zip | char(10) | YES | | NULL | |
| phone | varchar(25) | YES | | NULL | |
| fax | varchar(25) | YES | | NULL | |
| ehsChildren | int(11) | YES | | NULL | |
| hsChildren | int(11) | YES | | NULL | |
| siteCount | int(11) | YES | | NULL | |
| staffCount | int(11) | YES | | NULL | |
| grantee | varchar(255) | YES | | NULL | |
| programType | varchar(255) | YES | | NULL | |
| additional | text | YES | | NULL | |
+-------------------+--------------+------+-----+---------+-------+
For what I understand from your question, you seem to need a correlated query, which would look like this:
So, do I proceed to get to this result? Key is: express clearly the information you want to retrieve, without taking mental shortcuts.
Step 1: Choose the fields I need in the different tables of my database. That’s what is between SELECT and FROM. Seems obvious, but it becomes less obvious when it comes to aggregation function like sums or counts. In that case, you have to say, for example “I need the count of lines in userInfo for each firstName”. See below in GROUP BY.
Step 2: Knowing the field you need, write the joins between the different corresponding tables. That’s an easy one…
Step 3: Express your conditions. It can be easy, like if you want data from user for userName=”RZEZDFGBH”, or more complicated, like in your case: the way to formulate it so you can get the thing done, if you want only the most recent modifiedtimestamp, is “so that the modifiedtimestamp is equal to the most recent modifiedtimestamp” (that’s where you can easily take a mental shortcut and miss the point)
Step 4: If you have aggregates, it’s time to set the GROUP BY statement. For example, if you count all line in userInfo for each firstName, you would write “GROUP BY firstName”:
This gives you the number of entries in the table for each different firstName.
Step 5: HAVING conditions. These are conditions on the aggregates. In the previous example, if you wanted only the data for the firstName having more than 5 lines in the table, you could write
SELECT firstName,count(*) FROM userInfo GROUP BY firstName HAVING count(*)>5Step 6: Sort with ORDER BY. Pretty easy…
That’s only a short summary. There is much, much more to discover, but it would be too long to write an entire SQL course here… Hope it helps, though!