Sign Up

Sign Up to our social questions and Answers Engine to ask questions, answer people’s questions, and connect with other people.

Have an account? Sign In

Have an account? Sign In Now

Sign In

Login to our social questions & Answers Engine to ask questions answer people’s questions & connect with other people.

Sign Up Here

Forgot Password?

Don't have account, Sign Up Here

Forgot Password

Lost your password? Please enter your email address. You will receive a link and will create a new password via email.

Have an account? Sign In Now

You must login to ask a question.

Forgot Password?

Need An Account, Sign Up Here

Please briefly explain why you feel this question should be reported.

Please briefly explain why you feel this answer should be reported.

Please briefly explain why you feel this user should be reported.

Sign InSign Up

The Archive Base

The Archive Base Logo The Archive Base Logo

The Archive Base Navigation

  • Home
  • SEARCH
  • About Us
  • Blog
  • Contact Us
Search
Ask A Question

Mobile menu

Close
Ask a Question
  • Home
  • Add group
  • Groups page
  • Feed
  • User Profile
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Buy Points
  • Users
  • Help
  • Buy Theme
  • SEARCH
Home/ Questions/Q 6645389
In Process

The Archive Base Latest Questions

Editorial Team
  • 0
Editorial Team
Asked: May 26, 20262026-05-26T00:16:52+00:00 2026-05-26T00:16:52+00:00

I have a data frame that contains multiple data points for a large number

  • 0

I have a data frame that contains multiple data points for a large number of samples. Here is a shortened example with 3 samples each with 3 data points:

Assay       Genotype      Sample 
CCT6-002        G         sam1   
CCT6-007        G         sam1
CCT6-013        C         sam1 
CCT6-002        T         sam2   
CCT6-007        A         sam2
CCT6-013        T         sam2 
CCT6-002        T         sam3   
CCT6-007        A         sam3
CCT6-013        T         sam3 

To do my downstream analysis I would like to subset the data for each sample into an individual data frame. Since this is something that I will be doing with many data sets with changing sample names, Id like to figure out an automated way doing this so I don’t need to edit my script each time with the list of new samples.

I would like my output to be a data frame for each sample with the same name as the sample. So with the example data above, the result should be 3 data frames with the names sam1, sam2, sam3. Each data frame would have 3 lines with the Assay and genotype data.

I am sorry if this is a very basic question but Im a newbie and have been working on this for quite a while. Thanks!

  • 1 1 Answer
  • 0 Views
  • 0 Followers
  • 0
Share
  • Facebook
  • Report

Leave an answer
Cancel reply

You must login to add an answer.

Forgot Password?

Need An Account, Sign Up Here

1 Answer

  • Voted
  • Oldest
  • Recent
  • Random
  1. Editorial Team
    Editorial Team
    2026-05-26T00:16:53+00:00Added an answer on May 26, 2026 at 12:16 am

    The split command is the easiest way to turn this into a list of data.frame objects split on sample.

    myList <- split(mydf, mydf$Sample)
    

    The items can be accessed in the list by numeric indexing (i.e. myList[[1]]) or by the name of the unique item in the variable Sample (i.e. myList$sam1).

    The numeric indexing is obvioustly handy when you’re going through a sequence but you can still use the name for that as well.

     #get names of the unique items in sample
     nam <- unique(mydf$Sample)
     #as a test look at the first few rows of each of my data.frames
     for( i in nam) print( head(myList[[i]]) )
     #another way to use access to the data.frame is the with() statement
     for( i in nam) with(myList[[i]], print( Assay[1:2] )
    

    That’s not necessarily the most efficient R syntax but hopefully it gets you farther along in actually using your list of data.frame objects.

    Now, that gives you what you asked for but here’s some advice on what you asked for. Don’t do it. Just learn to properly acccess your data.frame object. You could just as easily not make the list up and go through all of the unique instances of Sample in your code… including saving them out as separate files. The advantage of that is that you can do lots of nifty vectorized commands on your intact data.frame across Sample that are much harder on the list. Just stick with you nice big data.frame.

    Here are a couple of simple examples. Look at what I did above for just getting the first few lines of each of the separate data.frame objects in the list. Here’s something similar just run on the big data.frame.

    lapply( unique(mydf$Sample), function(x) print(head( mydf[ mydf$Sample == x,] )) )
    

    How about something more meaningful? Let’s say I want a count of each individual Genotype separated by Sample.

    table( mydf$Genotype, mydf$Sample)
    

    That’s much easier than what you’d have to do with the big list. There’s lots of functions like that you’ll want to sue on your intact data.frame like tapply and aggregate. Even if you wanted to do something that seems like it might be easier with the data.frame broken up, like sorting within each Sample level, it’s easier with the data.frame.

    mydf[ order(mydf$Sample, mydf$Assay), ]
    

    That will order by Sample and then by Assay nested within Sample.

    When I started R I thought that splitting up data.frame objects was the way to go and used it a lot. Since I’ve learned R better I never ever do that. I don’t have a single bit of R code written after the few weeks with R that ever splits up the data.frame into a list. I’m not saying you should never do it. I’m just saying that it’s relatively rare that you need it or that it’s the best idea. You might want to post a query on here about your end goal and get some advice on that.

    • 0
    • Reply
    • Share
      Share
      • Share on Facebook
      • Share on Twitter
      • Share on LinkedIn
      • Share on WhatsApp
      • Report

Sidebar

Related Questions

I have a data frame that contains a long character string each associated with
So I have a data frame in R that contains integers, NA's, and a
I have a large data.frame, and I'd like to be able to reduce it
I have a data frame in R that has come about from running some
I have a column of data in a R data frame that has values
I have a data frame that looks like this: site date var dil 1
I have two data.frame s in R , each indexed by date. One is
I have data that looks like CUSTOMER, CUSTOMER_ID, PRODUCT ABC INC 1 XYX ABC
I have data from a table in a database (string) that contain text and
I have data that needs to be executed on a certain background thread. I

Explore

  • Home
  • Add group
  • Groups page
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Users
  • Help
  • SEARCH

Footer

© 2021 The Archive Base. All Rights Reserved
With Love by The Archive Base

Insert/edit link

Enter the destination URL

Or link to existing content

    No search term specified. Showing recent items. Search or use up and down arrow keys to select an item.