I’m afraid I’m no great shakes at SQL, so I’m not surprised I’m having trouble with this, but if you could help me get it to work (doesn’t even have to be one query), I’d be grateful. trying to analyze some Twitter data using MySQLdb in Python, I’m running:
for u_id in list:
"
select e.user_id
from table_entities e
inner join table_tweets t on e.id = t.id
where e.type='mention' and t.user_id=%s
group by e.type having count('hashtag') < 3
"
%
(u_id)
(python syntax faked slightly to not show the unimportant stuff)
now, everything before the “group by” statement works fine. I’m able to extract user_ids mentioned in a given tweet (id is the PK for table_tweets, whereas there’s another row in table_entities for each mention, hashtag, or URL) matching the current position of my loop.
however — and I don’t think I’m formatting it anywhere near correctly — the group by statement doesn’t do a thing. what I mean to do is exclude all user_ids belonging to tweets (ids) that have 3 or more entries in table_entity with type=hashtag. I can sort of tell it’s not going to work as it is, since it doesn’t actually refer to the id column, but any way that I’ve tried to do that (e.g. by trying to make it part of the join clause) throws a syntax error.
advice is appreciated!
This doesn’t really do what you want.
e.user_idin the SELECT clause and not in the GROUP BY MySQL will select one arbitrary user_id for each e.type.Having count('literalString')is the equivalent ofHaving COUNT(*)you can see this yourself by moving the Count(‘hashtag’) to the select clause.Here’s a Live DEMO of these points
The result is that your query will only records if there are fewer than 3 mentions for the user.
There are many way to accomplish what you’re trying I chose IN (you could also use Exists or an INNER JOIN to a subquery)
the sub select finds all user ids that have less than 3 records in table_enties that have an e.type of “hashtag” and the user that matches
% sThe main select filter for ‘mentions’ and the user id again. This allows you you to select for one e.type and filtering on a count of another e.type.