I came up with the following implementation for the Greedy Set Cover after much

Question

0

Asked: May 26, 20262026-05-26T15:52:42+00:00 2026-05-26T15:52:42+00:00

I came up with the following implementation for the Greedy Set Cover after much

0

I came up with the following implementation for the Greedy Set Cover after much discussion regarding my original question here. From the help I received, I encoded the problem into a “Greedy Set Cover” and after receiving some more help here, I came up with the following implementation. I am thankful to everyone for helping me out with this. The following implementation works fine but I want to make it scalable/faster.

By scalable/faster, I mean to say that:

My dataset contains about 50K-100K sets in S
The number of elements in U itself is very small in the order of 100-500
The size of each set in S could be anywhere from 0 to 40

And here goes my attempt:

U = set([1,2,3,4])
R = U
S = [set([1,2]), 
     set([1]), 
     set([1,2,3]), 
     set([1]), 
     set([3,4]), 
     set([4]), 
     set([1,2]), 
     set([3,4]), 
     set([1,2,3,4])]
w = [1, 1, 2, 2, 2, 3, 3, 4, 4]

C = []
costs = []

def findMin(S, R):
    minCost = 99999.0
    minElement = -1
    for i, s in enumerate(S):
        try:
            cost = w[i]/(len(s.intersection(R)))
            if cost < minCost:
                minCost = cost
                minElement = i
        except:
            # Division by zero, ignore
            pass
    return S[minElement], w[minElement]

while len(R) != 0:
    S_i, cost = findMin(S, R)
    C.append(S_i)
    R = R.difference(S_i)
    costs.append(cost)

print "Cover: ", C
print "Total Cost: ", sum(costs), costs

I am not an expert in Python but any Python-specific optimizations to this code would be really nice.

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-05-26T15:52:43+00:00

What sorts of times are you getting vs what you need? Surely most of the execution time is spent in c-level code finding set intersections, so there’s not much optimization you can do? With some random data (results may vary with your data of course, not sure if these are good values) of 100000 sets, 40 elements in each set, 500 unique elements, weights random from 1 to 10,

print 'generating test data'    
num_sets = 100000
set_size = 40
elements = range(500)
U = set(elements)
R = U
S = []
for i in range(num_sets):
    random.shuffle(elements)
    S.append(set(elements[:set_size]))
w = [random.randint(1,100) for i in xrange(100)]

C = []
costs = []

I got performance like this with cProfile:

         8200209 function calls in 14.391 CPU seconds

   Ordered by: standard name

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
        1    0.000    0.000   14.391   14.391 <string>:1(<module>)
       41    4.802    0.117   14.389    0.351 test.py:23(findMin)
        1    0.001    0.001   14.391   14.391 test.py:40(func)
  4100042    0.428    0.000    0.428    0.000 {len}
       82    0.000    0.000    0.000    0.000 {method 'append' of 'list' objects}
       41    0.001    0.000    0.001    0.000 {method 'difference' of 'set' objects}
        1    0.000    0.000    0.000    0.000 {method 'disable' of '_lsprof.Profiler' objects}
  4100000    9.160    0.000    9.160    0.000 {method 'intersection' of 'set' objects}

Hm, so mostly apparently 1/3 of the time isn’t in set intersections. But I personally wouldn’t optimize any more, especially at the cost of clarity. There’s not going to be much you can do with the other 2/3, so why bother?

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

I came up with the following implementation for the Greedy Set Cover after much

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply