Sign Up

Sign Up to our social questions and Answers Engine to ask questions, answer people’s questions, and connect with other people.

Have an account? Sign In

Have an account? Sign In Now

Sign In

Login to our social questions & Answers Engine to ask questions answer people’s questions & connect with other people.

Sign Up Here

Forgot Password?

Don't have account, Sign Up Here

Forgot Password

Lost your password? Please enter your email address. You will receive a link and will create a new password via email.

Have an account? Sign In Now

You must login to ask a question.

Forgot Password?

Need An Account, Sign Up Here

Please briefly explain why you feel this question should be reported.

Please briefly explain why you feel this answer should be reported.

Please briefly explain why you feel this user should be reported.

Sign InSign Up

The Archive Base

The Archive Base Logo The Archive Base Logo

The Archive Base Navigation

  • SEARCH
  • Home
  • About Us
  • Blog
  • Contact Us
Search
Ask A Question

Mobile menu

Close
Ask a Question
  • Home
  • Add group
  • Groups page
  • Feed
  • User Profile
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Buy Points
  • Users
  • Help
  • Buy Theme
  • SEARCH
Home/ Questions/Q 8060103
In Process

The Archive Base Latest Questions

Editorial Team
  • 0
Editorial Team
Asked: June 5, 20262026-06-05T09:52:55+00:00 2026-06-05T09:52:55+00:00

I have a vector of values, call it X, and a data frame, call

  • 0

I have a vector of values, call it X, and a data frame, call it dat.fram. I want to run something like “grep” or “which” to find all the indices of dat.fram[,3] which match each of the elements of X.

This is the very inefficient for loop I have below. Notice that there are many observations in X and each member of “match.ind” can have zero or more matches. Also, dat.fram has over 1 million observations. Is there any way to use a vector function in R to make this process more efficient?

Ultimately, I need a list since I will pass the list to another function that will retrieve the appropriate values from dat.fram .

Code:

match.ind=list()

for(i in 1:150000){
    match.ind[[i]]=which(dat.fram[,3]==X[i])
}
  • 1 1 Answer
  • 0 Views
  • 0 Followers
  • 0
Share
  • Facebook
  • Report

Leave an answer
Cancel reply

You must login to add an answer.

Forgot Password?

Need An Account, Sign Up Here

1 Answer

  • Voted
  • Oldest
  • Recent
  • Random
  1. Editorial Team
    Editorial Team
    2026-06-05T09:52:57+00:00Added an answer on June 5, 2026 at 9:52 am

    UPDATE:

    Ok, wow, I just found an awesome way of doing this… it’s really slick. Wondering if it’s useful in other contexts…?!

    ### define v as a sample column of data - you should define v to be 
    ### the column in the data frame you mentioned (data.fram[,3]) 
    
    v = sample(1:150000, 1500000, rep=TRUE)
    
    ### now here's the trick: concatenate the indices for each possible value of v,
    ### to form mybiglist - the rownames of mybiglist give you the possible values
    ### of v, and the values in mybiglist give you the index points
    
    mybiglist = tapply(seq_along(v),v,c)
    
    ### now you just want the parts of this that intersect with X... again I'll
    ### generate a random X but use whatever X you need to
    
    X = sample(1:200000, 150000)
    mylist = mybiglist[which(names(mybiglist)%in%X)]
    

    And that’s it! As a check, let’s look at the first 3 rows of mylist:

    > mylist[1:3]
    
    $`1`
    [1]  401143  494448  703954  757808 1364904 1485811
    
    $`2`
    [1]  230769  332970  389601  582724  804046  997184 1080412 1169588 1310105
    
    $`4`
    [1]  149021  282361  289661  456147  774672  944760  969734 1043875 1226377
    

    There’s a gap at 3, as 3 doesn’t appear in X (even though it occurs in v). And the
    numbers listed against 4 are the index points in v where 4 appears:

    > which(X==3)
    integer(0)
    
    > which(v==3)
    [1]  102194  424873  468660  593570  713547  769309  786156  828021  870796  
    883932 1036943 1246745 1381907 1437148
    
    > which(v==4)
    [1]  149021  282361  289661  456147  774672  944760  969734 1043875 1226377
    

    Finally, it’s worth noting that values that appear in X but not in v won’t have an entry in the list, but this is presumably what you want anyway as they’re NULL!

    Extra note: You can use the code below to create an NA entry for each member of X not in v…

    blanks = sort(setdiff(X,names(mylist)))
    mylist_extras = rep(list(NA),length(blanks))
    names(mylist_extras) = blanks
    mylist_all = c(mylist,mylist_extras)
    mylist_all = mylist_all[order(as.numeric(names(mylist_all)))]
    

    Fairly self-explanatory: mylist_extras is a list with all the additional list stuff you need (the names are the values of X not featuring in names(mylist), and the actual entries in the list are simply NA). The final two lines firstly merge mylist and mylist_extras, and then perform a reordering so that the names in mylist_all are in numeric order. These names should then match exactly the (unique) values in the vector X.

    Cheers! 🙂


    ORIGINAL POST BELOW… superseded by the above, obviously!

    Here’s a toy example with tapply that might well run significantly quicker… I made X and d relatively small so you could see what’s going on:

    X = 3:7
    n = 100
    d = data.frame(a = sample(1:10,n,rep=TRUE), b = sample(1:10,n,rep=TRUE), 
                   c = sample(1:10,n,rep=TRUE), stringsAsFactors = FALSE)
    
    tapply(X,X,function(x) {which(d[,3]==x)})
    
    • 0
    • Reply
    • Share
      Share
      • Share on Facebook
      • Share on Twitter
      • Share on LinkedIn
      • Share on WhatsApp
      • Report

Sidebar

Related Questions

I have two algorithms that I'm implementing: AlgorithmA which works with Vector values and
I have vector with 6 numbers, which I want insert to list and add
I have a data.frame , called so_data. Columns 13:23 are list s, which hold
I have a std::vector of values for which I know the maximum size, but
I have a map where I'd like to perform a call on every data
Given a vector of datetime values, I needed to create a data.frame containing datetimes
I have a vector, call it x, which contains very small numbers that I
I have a vector of values I need to add to a second vector
I have a vector containing the values 0, 1, 2 and 3. What I
I have a __m256d vector packed with four 64-bit floating-point values. I need to

Explore

  • Home
  • Add group
  • Groups page
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Users
  • Help
  • SEARCH

Footer

© 2021 The Archive Base. All Rights Reserved
With Love by The Archive Base

Insert/edit link

Enter the destination URL

Or link to existing content

    No search term specified. Showing recent items. Search or use up and down arrow keys to select an item.