I’m running a query to find out how much estimated work was done on a factory floor and how much time was actually tracked in comparison to the amount of hours that station has available.
I”m doing this to determine which machines we need to purchase more of. Anything that we have a usage factor of over 100% is something that we’re over capacity.
The issue is that I’m getting astronomically high numbers for some operations. It is impossible that 5 men working each at a machine could track more than 120 hours however the result I am getting is well over a thousand.
What I do in the query is take all the batches, which have tasks and sum all of the estimated time of each tasks. I sum all of the time_elapsed in the batch_log and I multiply the hours_open by the number of machines of that operation.
Because of this, deburr should have a max of 120 hours as they are open 24 hours a day and there are 5 deburring stations. Does anything glaring jump out when looking at this query?
Please let me know if you need more info.
SELECT
DATE(bl.start_time) as date_tracked,
o.name as operation,
SUM(TIME_TO_SEC(bl.time_elapsed)/ 3600) as time_elapsed,
SUM(t.estimated_nonrecurring + t.estimated_recurring) / 3600 as estimated,
o.hours_open as hours_open,
(count(distinct m.id)) as machine_count,
hours_open * (count(distinct m.id)) as total_hours,
(sum(TIME_TO_SEC(bl.time_elapsed)) / 3600) / (count(distinct m.id)) as time_elapsed_usage
FROM
batches b
INNER JOIN
tasks t on b.id = t.batch_id
INNER JOIN
batch_log bl on b.id = bl.batch_id
INNER JOIN
operations o on b.operation_id = o.id
INNER JOIN
machines m on b.operation_id = m.operation_id
WHERE
bl.time_elapsed < "8:00:00"
GROUP BY
b.operation_id,
DATE(bl.start_time)
ORDER BY date_tracked, o.id
So I’ve started again and once I get to this point I seem to have duplication in the time elapsed:
select
batches.operation_id,
date(batch_log.start_time) as date,
SEC_TO_TIME(SUM(TIME_TO_SEC(batch_log.time_elapsed))) as elapsed,
sum(tasks.estimated_nonrecurring + tasks.estimated_recurring) as estimated_time
from
batches
INNER JOIN batch_log on batches.id = batch_log.batch_id
INNER JOIN tasks on batches.id = tasks.batch_id
WHERE batches.id not in (
-1,
-2,
-3,
-4,
-5,
-6,
-7,
-8,
-9,
-10,
-11,
-12,
-13,
-14
)
group by Date(batch_log.start_time), operation_id
order by batch_log.start_time, batches.operation_id
EDIT: What am I doing wrong in the above? If I knew this I could be careful to structure queries better. Honestly, I haven’t been able to find anything and I’ve been digging through SQL books. Even if I could get an answer on the smaller statement I could make some progress. Working on other stuff for now.
Clarifications please…
Obviously Batch_Log multiple records per batch.
Batch table, distinct batch ID.
That said, here’s my review of your situation…
First, I’m are getting only the batch logs time elapsed less than 8:00:00
per your query. With that aggregation pre-grouped into single qualified
batches, I can then do simple join to batches and tasks by those batch IDs.
I can SUM() from tasks without worrying about double-counting as the starting
basis is a single batch ID. Group all this by a batch ID simplifies the
NEXT level joining to the Operations AND machines table
Then, for the ones that are of aggregations, I have pre-aggregated
those so they will return a single record respectively and reduce
the possibility of Cartesian COUNT() and SUM() issues.
WITH respect to machines. You have machines associated with an
operation, but you are then grouping by operation and date. That
being said, and it appears an operation CAN (and does) cross dates,
one machine would be accounted for each day. Will that cause some
possible skewed numbers??? Not sure, haven’t thought that far through.