First things, first. Here are my data:
lat <- c(12, 12, 58, 58, 58, 58, 58, 45, 45, 45, 45, 45, 45, 64, 64, 64, 64, 64, 64, 64)
long <- c(-14, -14, 139, 139, 139, 139, 139, -68, -68, -68, -68, -68, 1, 1, 1, 1, 1, 1, 1, 1)
sex <- c("M", "M", "M", "M", "F", "M", "M", "F", "M", "M", "M", "F", "M", "F", "M", "F", "F", "F", "F", "M")
score <- c(2, 6, 3, 6, 5, 4, 3, 2, 3, 9, 9, 8, 6, 5, 6, 7, 5, 7, 5, 1)
data <- data.frame(lat, long, sex, score)
The data should look like this:
lat long sex score
1 12 -14 M 2
2 12 -14 M 6
3 58 139 M 3
4 58 139 M 6
5 58 139 F 5
6 58 139 M 4
7 58 139 M 3
8 45 -68 F 2
9 45 -68 M 3
10 45 -68 M 9
11 45 -68 M 9
12 45 -68 F 8
13 45 1 M 6
14 64 1 F 5
15 64 1 M 6
16 64 1 F 7
17 64 1 F 5
18 64 1 F 7
19 64 1 F 5
20 64 1 M 1
I am at my wits end trying to figure this one out. The variables are latitude, longitude, sex and score. I would like to have an equal number of males and females within each location (i.e. with the same longitude and latitude). For instance, the second location (rows 3 to 7) has only one female. This female should be retained and one male from the remaining individuals should also be retained (by random sampling, perhaps). Some locations have only information about one sex, e.g. the first location (rows 1 and 2) has only data on males. The rows from this location should be dropped (since there are no females). All going according to plan the final dataset should look something like this:
lat2 long2 sex2 score2
1 58 139 F 5
2 58 139 M 4
3 45 -68 F 2
4 45 -68 M 3
5 45 -68 M 9
6 45 -68 F 8
7 64 1 M 6
8 64 1 F 5
9 64 1 F 7
10 64 1 M 1
Any help would be appreciated.
Below is a quick way to go about it, which involves creating a temporary column of the lat-long combination. We split the DF according to this column, count the M/F in each split, sample appropriately, then re-combine.
RESULTS