Since I am unsure how to phrase the question I will illustrate it with an example that is very similar to what I am trying to achieve.
I am looking for a way to optimize the amount of time it takes to perform the following task.
Suppose I have three sets of numbers labeled “A”, “B”, and “C”, each set containing an arbitrary number of integers.
I receive a stack of orders that ask for a “package” of numbers, each order asking for a particular combination of integers, one from each set. So an order might look like “A3, B8, C1”, which means I will need to grab a 3 from set A, an 8 from set B, and a 1 from set C.
The task is simple: grab an order, look at the numbers, then go collect them and put them together into a “package”.
It takes awhile for me to collect the numbers, and often times an order comes in asking for the same numbers as a previous order, so I decide to store all of the packages for later retrieval; this way, the amount of time it takes for me to process a duplicate order would be dramatically reduced rather than having to go and collect the same numbers again.
The amount of time it takes to collect a number is quite long, but not as long as examining each package one by one, if I have a lot of orders that day.
So for example if I have the following sets of numbers and orders
set A: [1, 2, 3] set B: [4, 5, 6, 12, 18] set C: [7, 8] Order 1: A1, B6, C7 Order 2: A3, B5, C8 Order 3: A1, B6, C7
I would put together packages for orders 1 and 2, but then I notice that order 3 is a duplicate order so I can choose to just take the package I put together for the first order and finish this last order quickly.
The goal is to optimize the amount of time taken to process a stack of orders. Currently I have come up with two methods, but perhaps there may be more ways to do things
-
Gather the numbers for each order, regardless whether it’s a duplicate or not. I end up with a lot of packages in the end, and for extreme cases where someone places a bulk order for 50 identical packages, it’s clearly a waste of time
-
check whether the package already exists in cache, perhaps using some sort of hashing method on the orders.
Any ideas?
There is not much detail given about how you fetch the data to compose packages etc. This makes it hard to come up with different solutions to your problem. For example, maybe existing packages could lead you to the data you need to compose new packages, although they differ in one way or another. For this there are actually dedicated hashing methods available like Locality Sensitive Hashing.
Given the two approaches you came up with, it sounds very natural to go for route 2. Hashing in the indices sounds trivial (first order is easily identified by the number 167, or string “167”, right?) and therefore you would have no real drawback from using a hash. Maybe memory constraints as you need to keep old packages around. There are also common methods out there to define which packages to keep in the (hashed) cache and which ones to throw away.