Sign Up

Sign Up to our social questions and Answers Engine to ask questions, answer people’s questions, and connect with other people.

Have an account? Sign In

Have an account? Sign In Now

Sign In

Login to our social questions & Answers Engine to ask questions answer people’s questions & connect with other people.

Sign Up Here

Forgot Password?

Don't have account, Sign Up Here

Forgot Password

Lost your password? Please enter your email address. You will receive a link and will create a new password via email.

Have an account? Sign In Now

You must login to ask a question.

Forgot Password?

Need An Account, Sign Up Here

Please briefly explain why you feel this question should be reported.

Please briefly explain why you feel this answer should be reported.

Please briefly explain why you feel this user should be reported.

Sign InSign Up

The Archive Base

The Archive Base Logo The Archive Base Logo

The Archive Base Navigation

  • SEARCH
  • Home
  • About Us
  • Blog
  • Contact Us
Search
Ask A Question

Mobile menu

Close
Ask a Question
  • Home
  • Add group
  • Groups page
  • Feed
  • User Profile
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Buy Points
  • Users
  • Help
  • Buy Theme
  • SEARCH
Home/ Questions/Q 7702257
In Process

The Archive Base Latest Questions

Editorial Team
  • 0
Editorial Team
Asked: May 31, 20262026-05-31T23:11:12+00:00 2026-05-31T23:11:12+00:00

I would like to select a subset of elements from a whole that satisfy

  • 0

I would like to select a subset of elements from a whole that satisfy certain conditions. There are about 20 elements, each having multiple attributes. I would like to select five elements that offer the least amount of discrepancy from a fixed criterion on one attribute, and offers the highest average value on another attribute.

Lastly, I would like to apply the function over multiple sets of 20 elements.

Thus far, I have been able to identify the subsets “by hand,” but I’d like to be able to return the index of the values in addition to returning the values themselves.

Objectives:

  1. I would like to find the set of five values for X1 that are the least discrepant from a fixed value (55), and provide the largest value for the average of X2.

  2. I would like to do this for multiple sets.


#####  generating example data
#####  this has five groups, each with two variables x1 and x2
set.seed(271828)

grp <- gl(5,20)
x1 <- round(rnorm(100,45, 12), digits=0)
x2 <- round(rbeta(100,2,4), digits = 2)
id <- seq(1,100,1)

#####  this is how the data would arrive for me to analyze
dat <- as.data.frame(cbind(id,grp,x1,x2))

The data would arrive in this format, with id as a unique identifier for each element.


#####  pulling out the first group for demonstration
dat.grp.1 <- dat[ which(grp == 1), ]

crit <- 55
x <- t(combn(dat.grp.1$x1, 5))
y <- t(combn(dat.grp.1$x2, 5))

mean.x <- rowMeans(x)
mean.y <- rowMeans(y)
k <- (mean.x - crit)^2

out <- cbind(x, mean.x, k, y, mean.y)

#####  finding the sets with the least amount of discrepancy
pick <- out[ which(k == min(k)), ]
pick

#####  finding the sets with low discrepancy and high values of y (means of X2) by "hand"
sorted <- out[order(k), ]
head(sorted, n=20)

With respect to the values in pick, I can see that the values of X1 are:

> pick
                    mean.x  k                          mean.y
[1,] 55 47 48 48 52     50 25 0.62 0.08 0.31 0.18 0.54  0.346
[2,] 55 48 48 47 52     50 25 0.62 0.31 0.18 0.48 0.54  0.426

I would like to return the id value for these elements, so that I know that I pick elements: 3, 8, 10, 11, and 18 (choosing set 2 since the discrepancy from k is the same, but the mean for y is higher).

> dat.grp.1 
    id grp x1   x2
 1   1   1 45 0.12
 2   2   1 27 0.34
 3   3   1 55 0.62
 4   4   1 39 0.32
 5   5   1 41 0.18
 6   6   1 29 0.47
 7   7   1 47 0.08
 8   8   1 48 0.31
 9   9   1 35 0.48
10  10   1 48 0.18
11  11   1 47 0.48
12  12   1 31 0.29
13  13   1 39 0.15
14  14   1 36 0.54
15  15   1 36 0.20
16  16   1 38 0.40
17  17   1 30 0.31
18  18   1 52 0.54
19  19   1 44 0.37
20  20   1 31 0.20

Doing this “by hand” works for now, but it would be good to make this as “hands-off” as possible.

Any help is greatly appreciated.

  • 1 1 Answer
  • 0 Views
  • 0 Followers
  • 0
Share
  • Facebook
  • Report

Leave an answer
Cancel reply

You must login to add an answer.

Forgot Password?

Need An Account, Sign Up Here

1 Answer

  • Voted
  • Oldest
  • Recent
  • Random
  1. Editorial Team
    Editorial Team
    2026-05-31T23:11:13+00:00Added an answer on May 31, 2026 at 11:11 pm

    You are almost there. You can change your definition of sorted to

    sorted <- out[order(k, -mean.y), ]
    

    And then sorted[1,] (or if you prefer sorted[1,,drop=FALSE]) is your selected set.

    If you want the indexes rather than/in addition to the points, then you can include that earlier. Replace:

    x <- t(combn(dat.grp.1$x1, 5))
    y <- t(combn(dat.grp.1$x2, 5))
    

    with

    idx <- t(combn(1:nrow(dat.grp.1), 5))
    x <- t(apply(idx, 1, function(i) {dat.grp.1[i,"x1"]}))
    y <- t(apply(idx, 1, function(i) {dat.grp.1[i,"x2"]}))
    

    and include idx in out later.

    Putting int all together:

    #####  pulling out the first group for demonstration
    dat.grp.1 <- dat[ which(grp == 1), ]
    
    crit <- 55
    idx <- t(combn(1:nrow(dat.grp.1), 5))
    x <- t(apply(idx, 1, function(i) {dat.grp.1[i,"x1"]}))
    y <- t(apply(idx, 1, function(i) {dat.grp.1[i,"x2"]}))
    
    mean.x <- rowMeans(x)
    mean.y <- rowMeans(y)
    k <- (mean.x - crit)^2
    
    out <- cbind(idx, x, mean.x, k, y, mean.y)
    
    #####  finding the sets with the least amount of discrepancy and among
    ##### those the largest second mean
    pick <- out[order(k, -mean.y)[1],,drop=FALSE]
    pick
    

    which gives

                                     mean.x  k                          mean.y
    [1,] 3 8 10 11 18 55 48 48 47 52     50 25 0.62 0.31 0.18 0.48 0.54  0.426
    

    EDIT: description of applying over idx was requested; I want more options than just what i can do in a comment so I’m adding it to my answer. Will also address looping over subsets.

    idx is a matrix (15504 x 5), each row of which is a set of (5) indexes for the dataframe. apply allows going through row-by-row (row-by-row is margin 1) to do something with each row. That something is take the values and use them to index the desired rows of dat.grp.1 and pull out the corresponding x1 values. I could have written dat.grp.1[i,"x1"] as dat.grp.1$x1[i]. Each row of idx becomes a column and the results of indexing into dat.grp.1 are the rows, so the whole thing needs to be transposed.

    You can break the loop apart to see how each step works if you like. Make the function into a non-anonymous function.

    f <- function(i) {dat.grp.1[i,"x1"]}
    

    and pass row at a time of idx to it.

    > f(idx[1,])
    [1] 45 27 55 39 41
    > f(idx[2,])
    [1] 45 27 55 39 29
    > f(idx[3,])
    [1] 45 27 55 39 47
    > f(idx[4,])
    [1] 45 27 55 39 48
    

    These are what get bundled into x

    > head(x,4)
         [,1] [,2] [,3] [,4] [,5]
    [1,]   45   27   55   39   41
    [2,]   45   27   55   39   29
    [3,]   45   27   55   39   47
    [4,]   45   27   55   39   48
    

    As for looping over subsets, the plyr library is very handy for this. The way you have set it up (assign the subset of interest to a variable and work with that) makes the transformation easy. Everything you do to create the answer for one subset goes into a function with that part as a parameter.

    find.best.set <- function(dat.grp.1) {
        crit <- 55
        idx <- t(combn(1:nrow(dat.grp.1), 5))
        x <- t(apply(idx, 1, function(i) {dat.grp.1[i,"x1"]}))
        y <- t(apply(idx, 1, function(i) {dat.grp.1[i,"x2"]}))
    
        mean.x <- rowMeans(x)
        mean.y <- rowMeans(y)
        k <- (mean.x - crit)^2
    
        out <- cbind(idx, x, mean.x, k, y, mean.y)
    
        out[order(k, -mean.y)[1],,drop=FALSE]
    }
    

    This is basically what you had before, but getting rid of some unnecessary assignments.

    Now wrap this in a plyr call.

    library("plyr")
    ddply(dat, .(grp), find.best.set)
    

    which gives

      grp V1 V2 V3 V4 V5 V6 V7 V8 V9 V10 V11 V12  V13  V14  V15  V16  V17   V18
    1   1  3  8 10 11 18 55 48 48 47  52  50  25 0.62 0.31 0.18 0.48 0.54 0.426
    2   2  8 10 12 15 16 53 35 55 76  56  55   0 0.71 0.20 0.43 0.50 0.70 0.508
    3   3  4 10 15 17 20 47 48 73 55  52  55   0 0.67 0.54 0.28 0.42 0.31 0.444
    4   4  2 11 13 17 19 47 46 70 62  50  55   0 0.35 0.47 0.18 0.13 0.47 0.320
    5   5  3  6 10 17 19 72 40 58 66  39  55   0 0.33 0.42 0.32 0.32 0.51 0.380
    

    I don’t know that that is the best format for your results, but it mirrors the example you gave.

    • 0
    • Reply
    • Share
      Share
      • Share on Facebook
      • Share on Twitter
      • Share on LinkedIn
      • Share on WhatsApp
      • Report

Sidebar

Related Questions

I would like to select all elements that have certain attribute or don't have
I would like to select a set of elements that are both of a
I would like to select a row from 2 different tables that are relational.
I would like to run something like: select * from table where field in
Is there a way to write a query like: select * from <some number
I would like to expand on this simple sub select: Select * from table1
I would like to select all descendant but blog nodes. For the example, only
I would like to select a node and modify its attributes and child-nodes using
I Would like to select the count of the newly added DISTINCT product_id in
I would like to select an element inside the td of one of my

Explore

  • Home
  • Add group
  • Groups page
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Users
  • Help
  • SEARCH

Footer

© 2021 The Archive Base. All Rights Reserved
With Love by The Archive Base

Insert/edit link

Enter the destination URL

Or link to existing content

    No search term specified. Showing recent items. Search or use up and down arrow keys to select an item.