Suppose now you have a group of data:
Data 1: (1, 2);
Data 2: (1, 3);
Data 3: (7, 8);
Data 4: (8, 20);
Now the task is to merge the data set if it has a common element with another data set. In our example, Data 1 will be merged with Data 2 as they share the common number 1. So will Data 3 and Data 4. My question is how can we implement this function in C++ in a very efficient. For the time being my implementation is based on std::vector > data structure, which is illustrated in the following codes:
#include <iostream>
#include <map>
#include <set>
#include <algorithm>
#include <vector>
using namespace std;
bool find_the_element(const set<int> &mysets, const vector<int> &myvector)
{
for(int i=0; i<myvector.size(); i++)
{
set<int>::iterator it;
it = mysets.find(myvector[i]);
if (it != mysets.end())
return true;
}
return false;
}
int main ()
{
set<vector<int> > myset;
vector<int> a;
a.push_back(1);
a.push_back(2);
vector<int> b;
b.push_back(1);
b.push_back(3);
vector<int> c;
c.push_back(7);
c.push_back(8);
vector<int> d;
d.push_back(8);
d.push_back(20);
vector<vector<int> > my_vector_array;
my_vector_array.push_back(a);
my_vector_array.push_back(b);
my_vector_array.push_back(c);
my_vector_array.push_back(d);
vector<set<int> > my_sets;
for(int i=0; i<my_vector_array.size(); i++)
{
vector<int> temp_vector = my_vector_array[i];
if (my_sets.empty())
{
set<int> temp_set;
for(int j=0; j<temp_vector.size(); j++)
temp_set.insert(temp_vector[j]);
my_sets.push_back(temp_set);
}
else
{
bool b_find = false;
for(int j=0; j<my_sets.size(); j++)
{
set<int>temp_set;
temp_set = my_sets[j];
if (find_the_element(temp_set,temp_vector))
{
b_find = true;
my_sets[j].insert(temp_vector.begin(), temp_vector.end());
break;
}
}
if (b_find)
{
// something already done
}
else
{
set<int> temp_set;
for(int j=0; j<temp_vector.size(); j++)
temp_set.insert(temp_vector[j]);
my_sets.push_back(temp_set);
}
}
}
}
I was wondering whether there are more effective data structure in C++ or efficient algorithms to do the job. Thanks!
One of the most efficient ways to implement sets that can be quickly merged is by using Disjoint-set Data Structure.
The idea is to represent each set initially as a linked list, with the head of the list serving as the identifier for the entire set. As sets get merged, nodes are re-pointed to the head to speed up further searches.
The article at the link has pseudo-code; C++ implementation should not be too difficult.
You would need to keep a separate
mapthat connects the integers that you have seen so far with their node within the disjoint-set forest. You would go through your data sets, take their items one by one, look up the item in themap, and either follow the link to its set, or create a new “singleton” disjoint set with the item that you are adding.