I have a data.frame with 1,000 rows and 3 columns. It contains a large number of duplicates and I’ve used plyr to combine the duplicate rows and add a count for each combination as explained in this thread.
Here’s an example of what I have now (I still also have the original data.frame with all of the duplicates if I need to start from there):
name1 name2 name3 total
1 Bob Fred Sam 30
2 Bob Joe Frank 20
3 Frank Sam Tom 25
4 Sam Tom Frank 10
5 Fred Bob Sam 15
However, column order doesn’t matter. I just want to know how many rows have the same three entries, in any order. How can I combine the rows that contain the same entries, ignoring order? In this example I would want to combine rows 1 and 5, and rows 3 and 4.
Define another column that’s a “sorted paste” of the names, which would have the same value of “Bob~Fred~Sam” for rows 1 and 5. Then aggregate based on that.
Brief code snippet (assumes original data frame is
dd): it’s all really intuitive. We create alookupcolumn (take a look and should be self explanatory), get the sums of thetotalcolumn for each combination, and then filter down to the unique combinations…You now have in
eea set of unique rows and their corresponding total counts. Easy – and no external packages needed. And crucially, you can see at every stage of the process what is going on!(Minor update to help OP:) And if you want a cleaned-up version of the final answer:
This gives you a neat data frame with the three all-important name columns, and with the aggregated totals in a column called
totalrather thannewtotal.