I have an assignment to create a twitter like database. And in this assignment i have to filter out the trending topics. My idea was to count the tweets with a specific tag between the date the tweet was made and 7 days later, and order them by the count.
I have the following 2 tables i am using for this query :
Table Tweet : id , message, users_id, date
Table Tweet_tags : id, tag, tweet_id
Since mysql isn’t my strong point at all im having trouble getting any results from the query.
The query i tried is :
Select
Count(twitter.tweet_tags.id) As NumberofTweets,
twitter.tweet_tags.tag
From twitter.tweet
Inner Join twitter.tweet_tags On twitter.tweet_tags.tweet_id = twitter.tweet.id
WHERE twitter.tweet_tags.tag between twitter.tweet.date and ADDDATE(twitter.tweet.date, INTERVAL 7 day)
ORDER BY NumberofTweets
The query works, but gives no results. I just can’t get it to work. Could you guys please help me out on this, or if you have a better way to get the trending topics please let me know!
Thanks alot!
This is equivalent to your query, with table aliases to make it easier to read, with BETWEEN replaced by two inequality predicates, and the ADDDATE function replaced with equivalent operation…
Two things pop out at me here…
First, there is no
GROUP BY. To get a count by “tag”, you want atGROUP BY tag.Second, you are comparing “tag” to “date”. I don’t know your tables, but that just doesn’t look right. (I expect “date” is a DATETIME or TIMESTAMP, and “tag” is a character string (maybe what my daughter calls a “hash tag”. Or is that tumblr she’s talking about?)
If I understand your requirement:
For each tweet, and for each tag associated with that tweet, you want to get a count of the number of other tweets, that have a matching tag, that are made within 7 days after the datetime of the tweet.
One way to get this result would be to use a correlated subquery. (This is probably the easiest approach to understand, but is probably not the best approach from a performance standpoint).
Another approach would be to use a join operation:
The counts from both of these queries assume that
tweet_tags (tweet_id, tag)is unique. If there are any “duplicates”, then including the DISTINCT keyword, i.e.COUNT(DISTINCT q.id)(in place ofCOUNT(1)andCOUNT(q.id)respectively) would get you the count of “related” tweets.NOTE: the counts returned will include the original tweet itself.
NOTE: removing the
LEFTkeywords from the query above should return an equivalent result, since the tweet/tag (from t/s) is guaranteed to match itself (from r/q), as long as the tag is not null and the tweetdateis not null.Those queries are going to have problematic performance on large sets. Appropriate covering indexes are going to be needed for acceptable performance: