I have a data frame like this:
FisherID Year Month VesselID
1 2000 1 56
1 2000 1 81
1 2000 2 81
1 2000 3 81
1 2000 4 81
1 2000 5 81
1 2000 6 81
1 2000 7 81
1 2000 8 81
1 2000 9 81
1 2000 10 81
1 2001 1 56
1 2001 2 56
1 2001 3 81
1 2001 4 56
1 2001 5 56
1 2001 6 56
1 2001 7 56
1 2002 3 81
1 2002 4 81
1 2002 5 81
1 2002 6 81
1 2002 7 81
…and I need the number of time that ID changes per year, so the output that I want to is:
FisherID Year DiffVesselUsed
1 2000 1
1 2001 2
1 2002 0
I tried to get that using aggregate():
aggregate(vesselID, by=list(FisherID,Year,Month ), length)
but what I got was:
FisherID Year DiffVesselUsed
1 2000 2
1 2001 1
1 2002 1
because aggregate() counted those different vessels when those only appeared in the same month. I have tried different way to aggregate without success. Any help will be deeply appreciated. Cheers, Rafael
First a question: Your expected output does’t seem to reflect what you ask for. You ask for the number of times an ID changes per year, but your expected output seems to indicate that you want to know how many unique
VesselIDs are observed per year. For example, in 2000, the ID changes once, and in 2001 the ID changes twice. In both years, two unique IDs are observed.So to get the result you posted,
If you’re looking for a statistic by
FisherIDandYear, then there’s no reason to look byMonthas well. Instead, you should look at the unique values of VesselID for each combination ofFisherIDandYear.If you really want the number of times ID changes, use the
rlefunction.