First things, first. Here are my data: lat <- c(12, 12, 58, 58, 58,

Question

0

Asked: June 15, 20262026-06-15T16:48:50+00:00 2026-06-15T16:48:50+00:00

First things, first. Here are my data: lat <- c(12, 12, 58, 58, 58,

0

First things, first. Here are my data:

lat <- c(12, 12, 58, 58, 58, 58, 58, 45, 45, 45, 45, 45, 45, 64, 64, 64, 64, 64, 64, 64)
long <- c(-14, -14, 139, 139, 139, 139, 139, -68, -68, -68, -68, -68, 1, 1, 1, 1, 1, 1, 1, 1)
sex <- c("M", "M", "M", "M", "F", "M", "M", "F", "M", "M", "M", "F", "M", "F", "M", "F", "F", "F", "F", "M")
score <- c(2, 6, 3, 6, 5, 4, 3, 2, 3, 9, 9, 8, 6, 5, 6, 7, 5, 7, 5, 1)

data <- data.frame(lat, long, sex, score)

The data should look like this:

   lat long sex score
1   12  -14   M     2
2   12  -14   M     6
3   58  139   M     3
4   58  139   M     6
5   58  139   F     5
6   58  139   M     4
7   58  139   M     3
8   45  -68   F     2
9   45  -68   M     3
10  45  -68   M     9
11  45  -68   M     9
12  45  -68   F     8
13  45    1   M     6
14  64    1   F     5
15  64    1   M     6
16  64    1   F     7
17  64    1   F     5
18  64    1   F     7
19  64    1   F     5
20  64    1   M     1

I am at my wits end trying to figure this one out. The variables are latitude, longitude, sex and score. I would like to have an equal number of males and females within each location (i.e. with the same longitude and latitude). For instance, the second location (rows 3 to 7) has only one female. This female should be retained and one male from the remaining individuals should also be retained (by random sampling, perhaps). Some locations have only information about one sex, e.g. the first location (rows 1 and 2) has only data on males. The rows from this location should be dropped (since there are no females). All going according to plan the final dataset should look something like this:

   lat2 long2 sex2 score2
1    58   139    F      5
2    58   139    M      4
3    45   -68    F      2
4    45   -68    M      3
5    45   -68    M      9
6    45   -68    F      8
7    64     1    M      6
8    64     1    F      5
9    64     1    F      7
10   64     1    M      1

Any help would be appreciated.

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-06-15T16:48:51+00:00

Below is a quick way to go about it, which involves creating a temporary column of the lat-long combination. We split the DF according to this column, count the M/F in each split, sample appropriately, then re-combine.

# First, We call the dataframe something other than "data" ;) 
mydf <- data.frame(lat, long, sex, score)

# create a new data frame with a temporary column, which concatenates the lat & long. 
mydf.new <- data.frame(mydf, latlong=paste(mydf$lat, mydf$long, sep=","))

# Split the data frame according to the lat-long location
mydf.splat <- split(mydf.new, mydf.new$latlong)

# eg, taking a look at one of our tables:
mydf.splat[[4]]

sampled <- 
  lapply(mydf.splat, function(tabl) {
    Ms <- sum(tabl$sex=="M")
    Fs <- sum(tabl$sex=="F")

    if(Fs == 0 || Ms ==0)    # If either is zero, we drop that location
      return(NULL)

    if(Fs == Ms)   # If they are both equal, no need to sample. 
      return(tabl)

    # If number of Females less than Males, return all Females 
    #    and sample from males in ammount equal to Females
    if (Fs < Ms)   
      return(tabl[c(which(tabl$sex=="F"), sample(which(tabl$sex=="M"), Fs)),   ])

    if (Ms < Fs)  # same as previous, but for Males < Femals
      return(tabl[c(which(tabl$sex=="M"), sample(which(tabl$sex=="F"), Ms)),   ])

    stop("hmmm... something went wrong.")  ## We should never hit this line, but just in case. 
  })

# Flatten into a single table
mydf.new <- do.call(rbind, sampled)

# Clean up
row.names(mydf.new) <- NULL  # remove the row names that were added
mydf.new$latlong <- NULL     # remove the temporary column that we added

RESULTS

mydf.new

#    lat long sex score
# 1   45  -68   F     2
# 2   45  -68   F     8
# 3   45  -68   M     9
# 4   45  -68   M     3
# 5   58  139   F     5
# 6   58  139   M     3
# 7   64    1   M     6
# 8   64    1   M     1
# 9   64    1   F     7
# 10  64    1   F     5

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

First things, first. Here are my data: lat <- c(12, 12, 58, 58, 58,

Leave an answerCancel reply

1 Answer

RESULTS

Leave an answer
Cancel reply