Sign Up

Sign Up to our social questions and Answers Engine to ask questions, answer people’s questions, and connect with other people.

Have an account? Sign In

Have an account? Sign In Now

Sign In

Login to our social questions & Answers Engine to ask questions answer people’s questions & connect with other people.

Sign Up Here

Forgot Password?

Don't have account, Sign Up Here

Forgot Password

Lost your password? Please enter your email address. You will receive a link and will create a new password via email.

Have an account? Sign In Now

You must login to ask a question.

Forgot Password?

Need An Account, Sign Up Here

Please briefly explain why you feel this question should be reported.

Please briefly explain why you feel this answer should be reported.

Please briefly explain why you feel this user should be reported.

Sign InSign Up

The Archive Base

The Archive Base Logo The Archive Base Logo

The Archive Base Navigation

  • SEARCH
  • Home
  • About Us
  • Blog
  • Contact Us
Search
Ask A Question

Mobile menu

Close
Ask a Question
  • Home
  • Add group
  • Groups page
  • Feed
  • User Profile
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Buy Points
  • Users
  • Help
  • Buy Theme
  • SEARCH
Home/ Questions/Q 3353906
In Process

The Archive Base Latest Questions

Editorial Team
  • 0
Editorial Team
Asked: May 18, 20262026-05-18T02:12:29+00:00 2026-05-18T02:12:29+00:00

I have a data frame with 900,000 rows and 11 columns in R. The

  • 0

I have a data frame with 900,000 rows and 11 columns in R. The column names and types are as follows:

column name: date / mcode / mname / ycode / yname / yissue  / bsent   / breturn / tsent   / treturn / csales
type:        Date / Char  / Char  / Char  / Char  / Numeric / Numeric / Numeric / Numeric / Numeric / Numeric

I want to calculate the subtotals. For example, I want to calculate the sums at each change in yname, and add subtotal to all numerical variables. There are 160 distinct ynames, so the resulting table should tell me the subtotal of each yname. I haven’t sorted the data yet, but this is not a problem because I can sort the data in whatever way I want. Below is a snippet from my data:

             date     mcode mname            ycode    yname   yissue bsent breturn tsent treturn csales
417572 2010-07-28     45740 ENDPOINT A        5772    XMAG  20100800     7       0     7       0      0
417573 2010-07-31     45740 ENDPOINT A        5772    XMAG  20100800     0       0     0       0      1
417574 2010-08-04     45740 ENDPOINT A        5772    XMAG  20100800     0       0     0       0      1
417575 2010-08-14     45740 ENDPOINT A        5772    XMAG  20100800     0       0     0       0      1
417576 2010-08-26     45740 ENDPOINT A        5772    XMAG  20100800     0       4     0       0      0
417577 2010-07-28     45741 ENDPOINT L        5772    XMAG  20100800     2       0     2       0      0
417578 2010-08-04     45741 ENDPOINT L        5772    XMAG  20100800     2       0     2       0      0
417579 2010-08-26     45741 ENDPOINT L        5772    XMAG  20100800     0       4     0       0      0
417580 2010-07-28     46390 ENDPOINT R        5772    XMAG  20100800     3       0     3       0      1
417581 2010-07-29     46390 ENDPOINT R        5772    XMAG  20100800     0       0     0       0      2
417582 2010-08-01     46390 ENDPOINT R        5779    YMAG  20100800     3       0     3       0      0
417583 2010-08-11     46390 ENDPOINT R        5779    YMAG  20100800     0       0     0       0      1
417584 2010-08-20     46390 ENDPOINT R        5779    YMAG  20100800     0       0     0       0      1
417585 2010-08-24     46390 ENDPOINT R        5779    YMAG  20100800     2       0     2       0      1
417586 2010-08-26     46390 ENDPOINT R        5779    YMAG  20100800     0       2     0       2      0
417587 2010-07-28     46411 ENDPOINT D        5779    YMAG  20100800     6       0     6       0      0
417588 2010-08-08     46411 ENDPOINT D        5779    YMAG  20100800     0       0     0       0      1
417589 2010-08-11     46411 ENDPOINT D        5779    YMAG  20100800     0       0     0       0      1
417590 2010-08-26     46411 ENDPOINT D        5779    YMAG  20100800     0       4     0       4      0

What function should I use here? Maybe something like SQL group by?

  • 1 1 Answer
  • 0 Views
  • 0 Followers
  • 0
Share
  • Facebook
  • Report

Leave an answer
Cancel reply

You must login to add an answer.

Forgot Password?

Need An Account, Sign Up Here

1 Answer

  • Voted
  • Oldest
  • Recent
  • Random
  1. Editorial Team
    Editorial Team
    2026-05-18T02:12:29+00:00Added an answer on May 18, 2026 at 2:12 am

    OK. Assuming your data are in a data frame named foo:

    > head(foo)
                 date mcode      mname ycode yname   yissue bsent breturn tsent
    417572 2010/07/28 45740 ENDPOINT A  5772  XMAG 20100800     7       0     7
    417573 2010/07/31 45740 ENDPOINT A  5772  XMAG 20100800     0       0     0
    417574 2010/08/04 45740 ENDPOINT A  5772  XMAG 20100800     0       0     0
    417575 2010/08/14 45740 ENDPOINT A  5772  XMAG 20100800     0       0     0
    417576 2010/08/26 45740 ENDPOINT A  5772  XMAG 20100800     0       4     0
    417577 2010/07/28 45741 ENDPOINT L  5772  XMAG 20100800     2       0     2
           treturn csales
    417572       0      0
    417573       0      1
    417574       0      1
    417575       0      1
    417576       0      0
    417577       0      0
    

    Then this will do the aggregation of the numeric columns in your data:

    > aggregate(cbind(bsent, breturn, tsent, treturn, csales) ~ yname, data = foo, 
    +           FUN = sum)
      yname bsent breturn tsent treturn csales
    1  XMAG    14       8    14       0      6
    2  YMAG    11       6    11       6      5
    

    That was using the snippet of data you included in your Q. I used the formula interface to aggregate(), which is a bit nicer in this instance because you don’t need all the foo$ bits on the variable names you wish the aggregate. If you have missing data (NA)in your full data set, then you’ll need add an extra argument na.rm = TRUE which will get passed to sum(), like so:

    > aggregate(cbind(bsent, breturn, tsent, treturn, csales) ~ yname, data = foo, 
    +           FUN = sum, na.rm = TRUE)
    
    • 0
    • Reply
    • Share
      Share
      • Share on Facebook
      • Share on Twitter
      • Share on LinkedIn
      • Share on WhatsApp
      • Report

Sidebar

Related Questions

I have a data frame with 900,000 rows and 11 columns in R. The
I have a data frame with 22239 rows & 200 columns. The first column
I have a data frame with two columns. First column contains categories such as
I have a data.frame as follows: dat <- structure(list(id = 1:4, date = structure(list(sec
I have a data frame that looks as follows (8 columns - the myPOSIX
I have a data frame where one column is species' names, and the second
I have a data.frame named d of ~1,300,000 lines and 4 columns and another
I have a data.frame, d1, that has 7 columns, the 5th through 7th column
I have a data.frame with a column with values ranging from 0 to 50.000.
I have a data.frame with 2 columns: Node A, Node B. Each entry in

Explore

  • Home
  • Add group
  • Groups page
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Users
  • Help
  • SEARCH

Footer

© 2021 The Archive Base. All Rights Reserved
With Love by The Archive Base

Insert/edit link

Enter the destination URL

Or link to existing content

    No search term specified. Showing recent items. Search or use up and down arrow keys to select an item.