I’m trying to take transactional data and cleanse it to meet my analysis needs. There are some limitations to how transactions are recorded into the database, and I am trying to get around those limitations.
When a customer places an order with more than 1 product, the transactional database doesn’t link the multiple products together. Each product will have a unique sales ID, but there is no way to group multiple sales ID into 1 order. Here is a sample:
OrderID MultOrderID CustomerID SalesDate SalesTime ProductID ProductCost ShippingCost
6082346 7661X0A 2012-06-12 959 105 99.99 7.95
6082347 5809812YY6Y T891002 2012-06-12 1005 222 99.95 7.95
6082348 5809812YY6Z T891002 2012-06-12 1005 273 22.95 1.00
6082349 5809812YY71 T891002 2012-06-12 1005 285 499.95 1.00
6082350 5809812YY72 T891002 2012-06-12 1005 172 49.95 1.00
6082351 5809812YY73 T891002 2012-06-12 1005 105 99.99 7.95
6082352 5809812YY74 X637251 2012-06-12 1010 285 499.95 7.95
6082353 5809812YY75 X637251 2012-06-12 1010 30 1024.99 1.00
6082354 T512AT0 2012-06-12 1017 172 49.95 7.95
An additional limitation to this transaction system is that it can not ship more than 4 products together. If the customer places an order for 5 products, 4 products are shipped together (and charged 1 shipping charge), the remaining product is shipped separately and charged another shipping charge (yes, the overall business wants to rebuild this entire legacy system….).
What I am trying to determine is the number of products shipped per order, and the aggregate product costs and shipping costs.
If you look at the last 4 characters of the MultOrderID, you’ll see that it’s sequential, YY6Y becomes YY6Z, then rolls over to YY71, YY72. The logic is standardized – I know that if the CustomerID, SalesDate and SalesTime are the same, then I can pair off the products together. What I don’t know is HOW I can accomplish this.
I believe the way to accomplish this is to break out the orders by CustomerID, SalesDate and SalesTime. Then, I get a for-loop, or something like that to cycle through the individual entries. Then, I look for the last 4 characters of the MultOrderID and say – If 1,2 and 3 are the same, and the 4th character is after the 4th character of the previous order, then pair it together, up to 4 orders. If the orderID is the 5th to 8th order in the range, then that’s shipment 2, etc.
Can this be done in SQL Server? If not in that, what should I write this in? And is a for-loop what I should be using in this case?
Edit: Here is the output I am trying to get to. Keep in mind that after the 4th product was shipped, I need to restart the ordering (so, 6 products get broken into 2 shipments [4 products and 2 products], 9 products into 3 shipments [4, 4, and 1].
PRODUCTSSHIPPED SALESDATE SALESTIME CUSTOMERID PRODUCTCOST SHIPPINGCOST
4 6/12/12 1005 T891002 672.8 10.95
1 6/12/12 1005 T891002 99.99 7.95
2 6/12/12 1010 X637251 1524.94 8.95
1 6/12/12 1017 T512AT0 49.95 7.95
1 6/12/12 959 7661X0A 99.99 7.95
Well from this statement it seems like you want this:
http://sqlfiddle.com/#!3/e0e71/30
So I’m not sure what you mean by using a foreach loop?
UPDATE:
Got it working using a subquery and ceiling function
Updated the fiddle
FYI SQL is:
I used ROW_NUMBER to get a running count of products for each order, then made this a subquery so I could do the grouping. The grouping just used the number of products divided by 4 (as float) and uses the ceiling function to round up to the nearest int to get it grouping into 4