I want to test if two or more values have membership on a list, but I’m getting an unexpected result:
>>> 'a','b' in ['b', 'a', 'foo', 'bar']
('a', True)
So, Can Python test the membership of multiple values at once in a list?
What does that result mean?
See also: How to find list intersection?. Checking whether any of the specified values is in the list, is equivalent to checking if the intersection is non-empty. Checking whether all the values are in the list, is equivalent to checking if they are a subset.
This does what you want, and will work in nearly all cases:
The expression
'a','b' in ['b', 'a', 'foo', 'bar']doesn’t work as expected because Python interprets it as a tuple:Other Options
There are other ways to execute this test, but they won’t work for as many different kinds of inputs. As Kabie points out, you can solve this problem using sets…
…sometimes:
Sets can only be created with hashable elements. But the generator expression
all(x in container for x in items)can handle almost any container type. The only requirement is thatcontainerbe re-iterable (i.e. not a generator).itemscan be any iterable at all.Speed Tests
In many cases, the subset test will be faster than
all, but the difference isn’t shocking — except when the question is irrelevant because sets aren’t an option. Converting lists to sets just for the purpose of a test like this won’t always be worth the trouble. And converting generators to sets can sometimes be incredibly wasteful, slowing programs down by many orders of magnitude.Here are a few benchmarks for illustration. The biggest difference comes when both
containeranditemsare relatively small. In that case, the subset approach is about an order of magnitude faster:This looks like a big difference. But as long as
containeris a set,allis still perfectly usable at vastly larger scales:Using subset testing is still faster, but only by about 5x at this scale. The speed boost is due to Python’s fast
c-backed implementation ofset, but the fundamental algorithm is the same in both cases.If your
itemsare already stored in a list for other reasons, then you’ll have to convert them to a set before using the subset test approach. Then the speedup drops to about 2.5x:And if your
containeris a sequence, and needs to be converted first, then the speedup is even smaller:The only time we get disastrously slow results is when we leave
containeras a sequence:And of course, we’ll only do that if we must. If all the items in
bigseqare hashable, then we’ll do this instead:That’s just 1.66x faster than the alternative (
set(bigseq) >= set(bigsubseq), timed above at 4.36).So subset testing is generally faster, but not by an incredible margin. On the other hand, let’s look at when
allis faster. What ifitemsis ten-million values long, and is likely to have values that aren’t incontainer?Converting the generator into a set turns out to be incredibly wasteful in this case. The
setconstructor has to consume the entire generator. But the short-circuiting behavior ofallensures that only a small portion of the generator needs to be consumed, so it’s faster than a subset test by four orders of magnitude.This is an extreme example, admittedly. But as it shows, you can’t assume that one approach or the other will be faster in all cases.
The Upshot
Most of the time, converting
containerto a set is worth it, at least if all its elements are hashable. That’s becauseinfor sets is O(1), whileinfor sequences is O(n).On the other hand, using subset testing is probably only worth it sometimes. Definitely do it if your test items are already stored in a set. Otherwise,
allis only a little slower, and doesn’t require any additional storage. It can also be used with large generators of items, and sometimes provides a massive speedup in that case.