Please forgive my ignorance here. SQL is decidedly one of the biggest “gaps” in my education that I’m working on correcting, come October. Here’s the scenario:
I have two tables in a DB that I need to access certain data from. One is users, and the other is conversation_log. The basic structure is outlined below:
users:
- id (INT)
- name (TXT)
conversation_log
- userid (INT) // same value as id in users – actually the only field in this table I want to check
- input (TXT)
- response (TXT)
(note that I’m only listing the structure for the fields that are {or could be} relevant to the current challenge)
What I want to do is return a list of names from the users table that have at least one record in the conversation_log table. Currently, I’m doing this with two separate SQL statements, with the one that checks for records in conversation_log being called hundreds, if not thousands of times, once for each userid, just to see if records exist for that id.
Currently, the two SQL statements are as follows:
select id from users where 1; (gets the list of userid values for the next query)
select id from conversation_log where userid = $userId limit 1; (checks for existing records)
Right now I have 4,000+ users listed in the users table. I’m sure that you can imagine just how long this method takes. I know there’s an easier, more efficient way to do this, but being self-taught, this is something that I have yet to learn. Any help would be greatly appreciated.
You have to do what is called a ‘Join’. This, um, joins the rows of two tables together based on values they have in common.
See if this makes sense to you:
Now JOIN by itself is an “inner join”, which means that it will only return rows that both tables have in common. In other words, if a specific conversation_log.userid doesn’t exist, it won’t return any part of the row, user or conversation log, for that userid.
Also, +1 for having a clearly worded question : )
EDIT: I added a “DISTINCT”, which means to filter out all of the duplicates. If a user appeared in more than one conversation_log row, and you didn’t have DISTINCT, you would get the user’s name more than once. This is because JOIN does a cartesian product, or does every possible combination of rows from each table that match your JOIN ON criteria.