(This is quite a long post, but the problem is I think easy to solve and I have a SQLFiddle ready) Please consider the following table:
----------------------------------------------------------------------
tweet_id sp100_id nyse_date user_id class_id retweets quality follow
----------------------------------------------------------------------
1 1 2011-03-12 1 1 0 2.50 5.00
2 1 2011-03-13 1 2 2 2.50 5.00
3 1 2011-03-13 1 2 1 2.50 5.00
4 1 2011-03-13 2 2 0 0.75 1.00
5 1 2011-03-13 2 3 3 0.75 1.00
6 2 2011-03-12 2 2 3 0.75 1.00
7 2 2011-03-12 2 2 0 0.75 1.00
8 2 2011-03-12 1 3 5 2.50 5.00
9 2 2011-03-13 2 2 0 0.75 1.00
----------------------------------------------------------------------
The desired output from this table is a list per sp100_id per _date the amount of positive (class=2) and negative (class=3) tweets weighted per retweets, quality and follow:
--------------------------------------------------------------------------------
sp100_id nyse_date pos-rt pos-quality pos-follow neg-rt neg-quality neg-follow
--------------------------------------------------------------------------------
1 2011-03-11 0 0 0 0 0 0
1 2011-03-12 0 0 0 0 0 0
1 2011-03-13 3 (1) 5.75 (2) 11.00 (3) 3 (4) 0.75 1.00
2 2011-03-11 0 0 0 0 0 0
2 2011-03-12 3 1.50 10.00 5.00 2.50 2.50
2 2011-03-13 0 0.75 1.00 0 0 0
--------------------------------------------------------------------------------
On 2011-03-13, 3 positive tweets for sp100_id 1:
(1) 1 tweet retweeted 2 times, 1 tweets retweeted 1 time and
1 tweet retweeted 0 times = 1 x 2 + 1 x 1 + 1 x 0 = 3
(2) 2 tweets with quality 2.50 and 1 tweet with quality 0.75 =
2 x 2.50 + 1 x 0.75 = 5.75
(3) 2 tweets with follow 5 and 1 tweet with follow 1 =
2 x 5.00 + 1 x 1.00 = 11.00
On 2011-03-13, 1 negative tweets for sp100_id 1:
(4) 1 tweet retweeted 3 times = 1 x 3 = 3
etc...
I have a demo on SQLFiddle with the necessary other tables (I need to link it to a daterange table because I also want to include recordsets with all zero’s). I also have an output for my query, but I don’t understand why it is different from the desired output:
--------------------------------------------------------------------------------
sp100_id nyse_date pos-rt pos-quality pos-follow neg-rt neg-quality neg-follow
--------------------------------------------------------------------------------
1 2011-03-11 0 0 0 0 0 0
1 2011-03-12 3 2 2 5 3 5
1 2011-03-13 3 8 12 3 1 1
2 2011-03-11 0 0 0 0 0 0
2 2011-03-12 3 2 2 5 3 5
2 2011-03-13 3 8 12 3 1 1
--------------------------------------------------------------------------------
I don’t see where the problem lies. Do you? Your help would be greatly appreciated 🙂
The reason why it wasn’t returning expected values is because you need to also include
sp100.sp100_id = tweets.sp100_idin theLEFT JOINcondition along with the date.By only joining on the date, it will join on any date value in the table, regardless of
sp100_id. This is why your resulting sums were being thrown off because for eachsp100_id, it was including the values of all othersp100_ids in theSUM()s.I also cleaned up your query a little bit (just in terms of aesthetics):
SQLFiddle Demo