Sign Up

Sign Up to our social questions and Answers Engine to ask questions, answer people’s questions, and connect with other people.

Have an account? Sign In

Have an account? Sign In Now

Sign In

Login to our social questions & Answers Engine to ask questions answer people’s questions & connect with other people.

Sign Up Here

Forgot Password?

Don't have account, Sign Up Here

Forgot Password

Lost your password? Please enter your email address. You will receive a link and will create a new password via email.

Have an account? Sign In Now

You must login to ask a question.

Forgot Password?

Need An Account, Sign Up Here

Please briefly explain why you feel this question should be reported.

Please briefly explain why you feel this answer should be reported.

Please briefly explain why you feel this user should be reported.

Sign InSign Up

The Archive Base

The Archive Base Logo The Archive Base Logo

The Archive Base Navigation

  • SEARCH
  • Home
  • About Us
  • Blog
  • Contact Us
Search
Ask A Question

Mobile menu

Close
Ask a Question
  • Home
  • Add group
  • Groups page
  • Feed
  • User Profile
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Buy Points
  • Users
  • Help
  • Buy Theme
  • SEARCH
Home/ Questions/Q 8758205
In Process

The Archive Base Latest Questions

Editorial Team
  • 0
Editorial Team
Asked: June 13, 20262026-06-13T14:29:07+00:00 2026-06-13T14:29:07+00:00

I wish to filter a dataframe, original.data, in R. The dataframe could have around

  • 0

I wish to filter a dataframe, original.data, in R. The dataframe could have around 1-2 million observations. The dataframe has several fields and the names may vary. The user can select which fields to filter by. These field names are stored in names(all.filters), where all.filters is a list of variable length. The user may then choose the levels for each of the fields in names(all.filters). For example, this list may look something like:

> all.filters
$Period
[1] "2010-12-31" "2011-03-31" "2011-06-30" "2011-09-30" "2011-12-31"
[6] "2012-03-31" "2012-06-30" "2012-09-30"

$Size
[1] "L"  "VL"

$Number
[1] "11" "21" "35" "42" "45" "47" "49" "52" "57"

I am using the following code to apply the chosen filters:

attach(original.data)    
filter.names <- names(all.filters)
flag <- 1
for(filter in filter.names){
   flag <- flag*(is.element(get(filter),all.filters[[filter]]))
}
filtered.data <- original.data[flag==1,]

This works, but it feels a little slow. Note that get(filter) retrieves the column of original.data with column name equal to filter. I’m not sure if this is a good way to filter the data, but the variable nature of all.filters limits my choices a bit – I wanted to use subset, but I’m not sure what to put as the select argument. I would like to make this filtering step as fast as possible so that when the user updates a filter selection, the data can be plotted quickly.

Once the data is filtered, I use reshape2 to summarise the data before plotting it with ggplot2. I am thinking that it may be more efficient to apply the filters at one of these steps if possible.

Any suggestions would be greatly appreciated.

  • 1 1 Answer
  • 0 Views
  • 0 Followers
  • 0
Share
  • Facebook
  • Report

Leave an answer
Cancel reply

You must login to add an answer.

Forgot Password?

Need An Account, Sign Up Here

1 Answer

  • Voted
  • Oldest
  • Recent
  • Random
  1. Editorial Team
    Editorial Team
    2026-06-13T14:29:08+00:00Added an answer on June 13, 2026 at 2:29 pm

    You can use a data.table with appropriately set keys. This will be memory efficient.

    Then you can pass your list of filters to the i component of [.data.table

    .period <- seq(from = as.Date("2010/1/1", "%Y/%m/%d"), to = as.Date("2012/1/1", 
        "%Y/%m/%d"), by = "3 months")
    .size <- c("XS", "S", "M", "L", "XL")
    .number <- as.character(1:100)
    DF <- expand.grid(Period = .period, Size = .size, Number = .number, stringsAsFactors = F)
    
    DF$other <- rnorm(nrow(DF))
    
    library(data.table)
    
    DT <- as.data.table(DF)
    
    DT[, `:=`(Period, as.IDate(.period))]
    
    
    ##           Period Size Number    other
    ##    1: 2010-01-01   XS      1  0.17947
    ##    2: 2010-04-01   XS      1  1.43252
    ##    3: 2010-07-01   XS      1 -0.97142
    ##    4: 2010-10-01   XS      1 -0.98021
    ##    5: 2011-01-01   XS      1 -0.62964
    ##   ---                                
    ## 4496: 2011-01-01   XL    100  0.65831
    ## 4497: 2011-04-01   XL    100 -0.45277
    ## 4498: 2011-07-01   XL    100 -0.14236
    ## 4499: 2011-10-01   XL    100 -0.02376
    ## 4500: 2012-01-01   XL    100 -0.11525
    
    all_filters <- list(Period = as.IDate(as.Date("2010/1/1", format = "%Y/%m/%d")), 
        Size = "L", Number = c("11", "21", "35", "42", "45", "47", "49", "52", "57"))
    
    
    setkeyv(DT, names(all_filters))
    
    DT[all_filters]
    
    ##        Period Size Number   other
    ## 1: 2010-01-01    L     11  1.4122
    ## 2: 2010-01-01    L     21 -0.4923
    ## 3: 2010-01-01    L     35  1.1262
    ## 4: 2010-01-01    L     42  1.3527
    ## 5: 2010-01-01    L     45 -0.3758
    ## 6: 2010-01-01    L     47 -0.1847
    ## 7: 2010-01-01    L     49 -0.8503
    ## 8: 2010-01-01    L     52 -1.0645
    ## 9: 2010-01-01    L     57 -0.6092
    

    The only issue I can see is that you will have to reset the key each time to ensure you are referencing the correct columns. Also, you will need to ensure that the filter identifiers are the same class as the columns in the data.frame — it may be easier to work with character not factor columns

    EDIT

    To filter with more than level on more than one column, use CJ. CJ is a cross join, (the data.table equivalent of expand.grid, with keys set)

    all_filters <- list(Period = as.IDate(as.Date("2010/1/1", format = "%Y/%m/%d")), 
      Size = c("L",'XL'), Number = c("11", "21", "35", "42", "45", "47", "49", "52", "57"))
    
    
    
    
    cj_filter <- do.call(CJ, all_filters)
    
    # note you could avoid this `do.call` line by
    # cj_filter <- CJ(Period = as.IDate(as.Date("2010/1/1", format = "%Y/%m/%d")), 
      Size = c("L",'XL'), Number = c("11", "21", "35", "42", "45", "47", "49", "52", "57"))
    
    setkeyv(DT, names(cj_filter))
    
    DT[cj_filter]
           Period Size Number       other
     1: 2010-01-01    L     11  0.36289104
     2: 2010-01-01    L     21  1.26356767
     3: 2010-01-01    L     35 -0.18629723
     4: 2010-01-01    L     42  0.92267902
     5: 2010-01-01    L     45  1.68796072
     6: 2010-01-01    L     47  1.75107447
     7: 2010-01-01    L     49  0.24048407
     8: 2010-01-01    L     52  0.06675221
     9: 2010-01-01    L     57  0.49665392
    10: 2010-01-01   XL     11  0.33682495
    11: 2010-01-01   XL     21  0.67642271
    12: 2010-01-01   XL     35 -0.16412768
    13: 2010-01-01   XL     42  0.72863394
    14: 2010-01-01   XL     45 -0.55527588
    15: 2010-01-01   XL     47  1.30850591
    16: 2010-01-01   XL     49  1.08688166
    17: 2010-01-01   XL     52 -0.31157250
    18: 2010-01-01   XL     57  0.43626422
    

    You could also do

     setkeyv(DT, names(all_filters))
    
     DT[do.call(CJ,all_filters)]
    
    • 0
    • Reply
    • Share
      Share
      • Share on Facebook
      • Share on Twitter
      • Share on LinkedIn
      • Share on WhatsApp
      • Report

Sidebar

Related Questions

I wish to filter a data frame based on conditions in several columns. For
I have a BufferedImage that I get that has an IndexColorModel. I then wish
For example suppose I have $blah = C$#@#.a534&; I wish to filter the string
I know i could have used python's own functional tool-set, but I wish there's
I have a working tree model derived from QAbstractItemModel and I wish to filter
I have some problem with backgroundWorker class. I wish I could within one function
Hello I wish I could define two filters like this <filter-mapping> <filter-name>SecurityFilter</filter-name> <url-pattern>/*</url-pattern> </filter-mapping>
I wish fetch data in a XML file, where Tag is a variable.. My
I wish HTML could do something semantically equivalent to this; <dl class=main-list> <definitionitem> <dt>Some
I wish I could figure this out. I need to produce a table with

Explore

  • Home
  • Add group
  • Groups page
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Users
  • Help
  • SEARCH

Footer

© 2021 The Archive Base. All Rights Reserved
With Love by The Archive Base

Insert/edit link

Enter the destination URL

Or link to existing content

    No search term specified. Showing recent items. Search or use up and down arrow keys to select an item.