I have got 5 tables of which the structures are the same. Only the PAGEVISITS field is unique
ie. table 1:
ITEM | PAGEVISITS | Commodity
1813 50 Griddle
1851 10 Griddle
11875 100 Refrigerator
2255 25 Refrigerator
ie. table 2:
ITEM | PAGEVISITS | Commodity
1813 0 Griddle
1851 10 Griddle
11875 25 Refrigerator
2255 10 Refrigerator
I want it to add up the Commodity to spit out:
table1 | table2 | Commodity
60 10 Griddle
125 35 Refrigerator
Some of the data is actually correct but some are WAY off given the below query:
SELECT
SUM(MT.PAGEVISITS) as table1,
SUM(CT1.PAGEVISITS) as table2,
SUM(CT2.PAGEVISITS) as table3,
SUM(CT3.PAGEVISITS) as table4,
SUM(CT4.PAGEVISITS) as table5,
(COUNT(DISTINCT MT.ITEM)) + (COUNT(DISTINCT CT1.ITEM)) + (COUNT(DISTINCT CT2.ITEM)) + (COUNT(DISTINCT CT3.ITEM)) + (COUNT(DISTINCT CT4.ITEM)) as Total,
MT.Commodity
FROM table1 as MT
LEFT JOIN table2 CT1
on MT.ITEM = CT1.ITEM
LEFT JOIN table3 CT2
on MT.ITEM = CT2.ITEM
LEFT JOIN table4 CT3
on MT.ITEM = CT3.ITEM
LEFT JOIN table5 CT4
on MT.ITEM = CT4.ITEM
GROUP BY Commodity
I believe this may be cause by using the LEFT JOIN incorrectly. I have also tried the INNER JOIN with the same inconsistent results.
I would do a UNION on all five of those tables to get them as one rowset (inline view), and then run a query on that, start with something like this…
(But I would specify the column list for each of those tables, rather than using the ‘.*’ and having my query dependent on no one adding/dropping/renaming/reordering columns in any of those tables.)
I include an “extra” literal value (aliased as “source”) to identify which table the row came from. I can use a conditional test in an expression in the SELECT list, to figure out whether the row came from a particular table.
This approach is particularly flexible, and can be used to get more complicated resultsets. For example, if I also wanted to get a total number page visits from table3, 4 and 5 added together, along with the individual counts.
To get the equivalent of your
COUNT(DISTINCT item) + COUNT(DISTINCT item) + ...expression…I would use an expression that makes a single value from both the “source” and “item” columns, being careful to have some sort of guarantee that any particular “source”+”item” will not create a duplicate of some other “source”+”item”. (If we just concatenate strings, for example, we don’t have any way to distinguish between ‘A’+’11’ and ‘A1’+’1’.) The most common approach I see here is a carefully chosen delimiter which is guaranteed not to appear in either value. We can distinguish between ‘A::11’ and ‘A1::1’, so something like this will work:
In your current query, if
itemis NULL, then the row doesn’t get included in the COUNT. To fully replicate that behavior, you would need something like this:Or course, getting a count of distinct item values over the whole set of five tables is much simpler (but then, it does return a different result)
But to answer your question about the use of the
LEFT JOIN, the left side table is the “driver” so a matching row has to be in that table for a corresponding row to be retrieved from a table on the right. That is, unmatched rows from the tables on the right side will not be returned.If what you have is basically five “partitions”, and you want to process all of the rows whether or not a matching row appears in any of the other “partitions”, I would go with the
UNION ALLapproach to simply concatenate all of the rows from all of those tables together, and process the rows as if they were from a single table.NOTE: For very large tables, this may not be a feasible approach, since MySQL is going to have to materialize that inline view. There are other approaches which don’t require concatenating all of the rows together.
Specifying a list of only the columns you need, in the SELECT from each table, may help performance, if there are columns in those tables you don’t need to reference in your query.