A common pattern in my code is: “search through a list until I find a particular element, then look at the elements that come before and after it.”
As an example, I might want to look through a log file where important events are marked with asterisks, and then pull out the context of the important event.
In the following example, I want to know why the hyperdrive exploded:
Spinning up the hyperdrive
Hyperdrive speed 100 rpm
Hyperdrive speed 200 rpm
Hyperdrive lubricant levels low (100 gal.)
* CRITICAL EXISTENCE FAILURE
Hyperdrive exploded
I want a function, get_item_with_context(), that allows me to find the first line with an asterisk, and then gives me up to n lines preceding it, and m lines following it.
My attempt is below:
import collections, itertools
def get_item_with_context(predicate, iterable, items_before = 0, items_after = 0):
# Searches through the list of `items` until an item matching `predicate` is found.
# Then return that item.
# If no item matching predicate is found, return None.
# Optionally, also return up to `items_before` items preceding the target, and
# `items after` items after the target.
#
# Note:
d = collections.deque (maxlen = items_before + 1 + items_after)
iter1 = iterable.__iter__()
iter2 = itertools.takewhile(lambda x: not(predicate(x)), iter1)
d.extend(iter2)
# zero-length input, or no matching item
if len(d) == 0 or not(predicate(d[-1])):
return None
# get context after match:
try:
for i in xrange(items_after):
d.append(iter1.next())
except StopIteration:
pass
if ( items_before == 0 and items_after == 0):
return d[0]
else:
return list(d)
Usage should be like:
>>> get_item_with_context(lambda x: x == 3, [1,2,3,4,5,6],
items_before = 1, items_after = 1)
[2, 3, 4]
Problems with this:
- Checking to make sure we actually found a match, using
not(predicate(d[-1])), doesn’t work for some reason. It always returns false. - If there are less than
items_afteritems in the list after the matching item is found, then the results are rubbish. - Other edge cases?
Can I please have some advice on how to make this work / make it more robust? Or, if I’m reinventing the wheel, feel free to tell me that too.
This appears to handle edge cases correctly: