In Python, set is pretty handy for comparing 2 lists of strings (see this link). I was wondering if there’s a good solution for C++ in terms of performance. As each list has over 1 million strings in it.
It’s case-sensitive matching.
Sign Up to our social questions and Answers Engine to ask questions, answer people’s questions, and connect with other people.
Login to our social questions & Answers Engine to ask questions answer people’s questions & connect with other people.
Lost your password? Please enter your email address. You will receive a link and will create a new password via email.
Please briefly explain why you feel this question should be reported.
Please briefly explain why you feel this answer should be reported.
Please briefly explain why you feel this user should be reported.
The data types
std::set<>(usually implemented as a balanced tree) andstd::unordered_set<>(from C++11, implemented as a hash) are available. There is also a convenience algorithm calledstd::set_intersectionthat computes the actual intersection.Here is an example.
Note. If you want to use
std::unordered_set<>, thestd::set_intersectioncannot be used like this, because it expects the input sets to be ordered. You’d have to use the usual technique of a for-loop iterating over the smaller set and finding the elements in the larger one to determine the intersection. Nevertheless, for a large number of elements (especially, strings), the hash-basedstd::unordered_set<>may be faster. There are also STL-compatible implementations such as the one in Boost (boost::unordered_set) and the one created by Google (sparse_hash_setanddense_hash_set). For various other implementations and benchmarks (including one for strings), see here.