In python, I am working on a project that regularly involves computing how many objects with some given properties match certain conditions. I can see how to do this with lists of tuples or objects or a database, but I wonder if filtering a list of objects this way is the ‘obvious pythonic’ way to do it.
The options I thought of look like this:
list_of_all = [object_type(property0, property1, ...), ...]
number_of_matches = len(filter(object_type.property2_test(property2),
filter(object_type.property1_getter, list_of_all)
list_of_all = [object_type(property0, property1, ...), ...]
number_of_matches = len([0 for candidate in list_of_all
if candidate.property1 and candidate.property2 == property2])
list_of_all = [(property0, property1, ...), ...]
number_of_matches = len([0 for candidate in list_of_all
if candidate[1] and candidate[2] == property2])
db_cursor.execute("""CREATE TABLE table_of_all
(property0 INTEGER, property1 INTEGER, ...)""")
number_of_matches = len(db_cursor.execute("""SELECT 1 FROM table_of_all
WHERE property1 = 1 AND property2 = ?""", (property2,)).fetchall())
Using the last two in context of my code, timeit tells me that there is so significant difference.
$ python -m timeit -n 100 'import with_db' | tail -n1
100 loops, best of 3: 0.751 usec per loop
$ python -m timeit -n 100 'import with_list' | tail -n1
100 loops, best of 3: 0.761 usec per loop
with both values somewhere between 0.751 and 0.811 every time.
In general, my use case consists of a few hundred (for testing purposes) up to at least 40000 objects. They are used in a simulation loop (no I/O to wait for). Every loop iteration consists of around 50 of these lookups, possibly finding the maximum of one property, and the update of two objects, but if there are other use cases where there is a best solution, that would also be interesting.
Is there a distinct better solution for this type of task, or is there a distinct type of task where chosing any of these solutions makes a difference?
You can let the database do the counting:
In Python you can avoid creating an intermediate list by using
sumand a generator expression: