In Python 3, how can I check whether an object is a container (rather than an iterator that may allow only one pass)?
Here’s an example:
def renormalize(cont):
'''
each value from the original container is scaled by the same factor
such that their total becomes 1.0
'''
total = sum(cont)
for v in cont:
yield v/total
list(renormalize(range(5))) # [0.0, 0.1, 0.2, 0.3, 0.4]
list(renormalize(k for k in range(5))) # [] - a bug!
Obviously, when the renormalize function receives a generator expression, it does not work as intended. It assumes it can iterate through the container multiple times, while the generator allows only one pass through it.
Ideally, I’d like to do this:
def renormalize(cont):
if not is_container(cont):
raise ContainerExpectedException
# ...
How can I implement is_container?
I suppose I could check if the argument is empty right as we’re starting to do the second pass through it. But this approach doesn’t work for more complicated functions where it’s not obvious when exactly the second pass starts. Furthermore, I’d rather put the validation at the function entrance, rather than deep inside the function (and shift it around whenever the function is modified).
I can of course rewrite the renormalize function to work correctly with a one-pass iterator. But that require copying the input data to a container. The performance impact of copying millions of large lists “just in case they are not lists” is ridiculous.
EDIT: My original example used a weighted_average function:
def weighted_average(c):
'''
returns weighted average of a container c
c contains values and weights in tuples
weights don't need to sum up 1 (automatically renormalized)
'''
return sum((v * w for v, w in c)) / sum((w for v, w in c))
weighted_average([(0,1), (1,1)]) #0.5
weighted_average([(k, 1) for k in range(2)]) #0.5
weighted_average((k, 1) for k in range(2)) #mistake
But it was not the best example since the version of weighted_average rewritten to use a single pass is arguably better anyway:
def weighted_average(it):
'''
returns weighted average of an iterator it
it yields values and weights in tuples
weights don't need to sum up 1 (automatically renormalized)
'''
total_value = 0
total_weight = 0
for v, w in it:
total_value += v
total_weight += w
return total_value / total_weight
Although all iterables should subclass collections.Iterable, not all of them do, unfortunately. Here is an answer based on what interface the objects implement, instead of what they “declare”.
Short answer:
A “container” as you call it, ie a list/tuple that can be iterated over more than once as opposed to being a generator that will be exhausted, will typically implement both
__iter__and__getitem__. Hence you can do this:Long answer:
However, you can make an iterable that will not be exhausted and do not support getitem. For example, a function that generates prime-numbers. You could repeat the generation many times if you want, but having a function to retrieve the 1065th prime would take a lot of calculation, so you may not want to support that. 🙂
So is there any more “reliable” way?
Well, all iterables will implement an
__iter__function that will return an iterator. The iterators will have a__next__function. This is what is used when iterating over it. Calling__next__repeatedly will in the end exhaust the iterator.So if it has a
__next__function it is an iterator, and will be exhausted.Iterables that are not yet iterators will not have a
__next__function, but will implement a__iter__function, that will return an iterable:So you can check that the object has
__iter__but that it does not have__next__.Iterators also has an
__iter__function, that will return self.Hence, you can do these variations of the checking:
That would fail if you implement an object that returns a broken iterator, one that does not return self when you call iter() on it again. But then your (or a third-party modules) code is actually doing things wrong.
It does depends on making an iterator though, and hence calling the objects
__iter__, which in theory may have side-effects, while the above hasattr calls should not have side effects. OK, so it calls getattribute which could have. But you can fix that thusly:This one is reasonably safe, and should work in all cases except if the object generates
__next__or__iter__dynamically on__getattribute__calls, but if you do that you are insane. 🙂Instinctively my preferred version would be
iter(o) is o, but I haven’t ever needed to do this, so that’s not based on experience.