I have a data frame that looks like this:
_________________id ________________text______
1 | 7821 | "some text here"
2 | 7821 | "here as well"
3 | 7821 | "and here"
4 | 567 | "etcetera"
5 | 567 | "more text"
6 | 231 | "other text"
And I would like to group the texts by IDs, so I can run a clustering algorithm:
________________id___________________text______
1 | 7821 | "some text here here as well and here"
2 | 567 | "etcetera more text"
3 | 231 | "other text"
Is there any way to do this? I am importing from a database table and I have a lot of data, so I can’t do it manually.
You’re actually looking for
aggregate, notmerge, and there should be many examples on SO demonstrating different options for aggregation. Here’s the most basic and direct approach, using the formula approach to specify which columns toaggregate.Here’s your data in a copy-and-pasteable form
Here’s the aggregated output.
Of course, there is also
data.table, which has nice compact syntax (and awesome speed):