Sign Up

Sign Up to our social questions and Answers Engine to ask questions, answer people’s questions, and connect with other people.

Have an account? Sign In

Have an account? Sign In Now

Sign In

Login to our social questions & Answers Engine to ask questions answer people’s questions & connect with other people.

Sign Up Here

Forgot Password?

Don't have account, Sign Up Here

Forgot Password

Lost your password? Please enter your email address. You will receive a link and will create a new password via email.

Have an account? Sign In Now

You must login to ask a question.

Forgot Password?

Need An Account, Sign Up Here

Please briefly explain why you feel this question should be reported.

Please briefly explain why you feel this answer should be reported.

Please briefly explain why you feel this user should be reported.

Sign InSign Up

The Archive Base

The Archive Base Logo The Archive Base Logo

The Archive Base Navigation

  • SEARCH
  • Home
  • About Us
  • Blog
  • Contact Us
Search
Ask A Question

Mobile menu

Close
Ask a Question
  • Home
  • Add group
  • Groups page
  • Feed
  • User Profile
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Buy Points
  • Users
  • Help
  • Buy Theme
  • SEARCH
Home/ Questions/Q 9181757
In Process

The Archive Base Latest Questions

Editorial Team
  • 0
Editorial Team
Asked: June 17, 20262026-06-17T18:23:45+00:00 2026-06-17T18:23:45+00:00

While experimenting with aggregate for another question here , I encountered a rather strange

  • 0

While experimenting with aggregate for another question here, I encountered a rather strange result. I’m unable to figure out why and am wondering if what I’m doing is totally wrong.

Suppose, I have a data.frame like this:

df <- structure(list(V1 = c(1L, 2L, 1L, 2L, 3L, 1L), 
                     V2 = c(2L, 3L, 2L, 3L, 4L, 2L), 
                     V3 = c(3L, 4L, 3L, 4L, 5L, 3L), 
                     V4 = c(4L, 5L, 4L, 5L, 6L, 4L)), 
                  .Names = c("V1", "V2", "V3", "V4"), 
        row.names = c(NA, -6L), class = "data.frame")
> df
#   V1 V2 V3 V4
# 1  1  2  3  4
# 2  2  3  4  5
# 3  1  2  3  4
# 4  2  3  4  5
# 5  3  4  5  6
# 6  1  2  3  4

Now, if I want to output a data.frame with unique rows with an additional column indicating their frequency in df. For this example,

#   V1 V2 V3 V4 x
# 1  1  2  3  4 3
# 2  2  3  4  5 2
# 3  3  4  5  6 1

I obtained this output using aggregate by experimenting as follows:

> aggregate(do.call(paste, df), by=df, print)

# [1] "1 2 3 4" "1 2 3 4" "1 2 3 4"
# [1] "2 3 4 5" "2 3 4 5"
# [1] "3 4 5 6"
#   V1 V2 V3 V4                         x
# 1  1  2  3  4 1 2 3 4, 1 2 3 4, 1 2 3 4
# 2  2  3  4  5          2 3 4 5, 2 3 4 5
# 3  3  4  5  6                   3 4 5 6

So, this gave me the pasted string. So, if I were to use length instead of print, it should give me the number of such occurrences, which is the desired result, which was the case (as shown below).

> aggregate(do.call(paste, df), by=df, length)
#   V1 V2 V3 V4 x
# 1  1  2  3  4 3
# 2  2  3  4  5 2
# 3  3  4  5  6 1

And this seemed to work. However, when the data.frame dimensions are 4*2500, the output data.frame is 1*2501 instead of 4*2501 (all rows are unique, so the frequency is 1).

> df <- as.data.frame(matrix(sample(1:3, 1e4, replace = TRUE), nrow=4))
> o <- aggregate(do.call(paste, df), by=df, length)
> dim(o)
# [1]    1 2501

I tested with smaller data.frames with just unique rows and it gives the right output (change nrow=40, for example). However, when the dimensions of the matrix increase, this doesn’t seem to work. And I just can’t figure out what’s going wrong! Any ideas?

  • 1 1 Answer
  • 0 Views
  • 0 Followers
  • 0
Share
  • Facebook
  • Report

Leave an answer
Cancel reply

You must login to add an answer.

Forgot Password?

Need An Account, Sign Up Here

1 Answer

  • Voted
  • Oldest
  • Recent
  • Random
  1. Editorial Team
    Editorial Team
    2026-06-17T18:23:46+00:00Added an answer on June 17, 2026 at 6:23 pm

    The issue here is how aggregate.data.frame() determines the groups.

    In aggregate.data.frame() there is a loop which forms the grouping variable grp. In that loop, grp is altered/updated via:

    grp <- grp * nlevels(ind) + (as.integer(ind) - 1L)
    

    The problem with your example if that once by is converted to factors, and the loop has gone over all of these factors, in your example grp ends up being:

    Browse[2]> grp
    [1] Inf Inf Inf Inf
    

    Essentially the looping update pushed the values of grp to a number indistinguishable from Inf.

    Having done that, aggregate.data.frame() later does this

    y <- y[match(sort(unique(grp)), grp, 0L), , drop = FALSE]
    

    and this is where the earlier problem now manifests itself as

    dim(y[match(sort(unique(grp)), grp, 0L), , drop = FALSE])
    

    because

    match(sort(unique(grp)), grp, 0L)
    

    clearly returns just 1:

    > match(sort(unique(grp)), grp, 0L)
    [1] 1
    

    as there is only one unique value of grp.

    • 0
    • Reply
    • Share
      Share
      • Share on Facebook
      • Share on Twitter
      • Share on LinkedIn
      • Share on WhatsApp
      • Report

Sidebar

Related Questions

While experimenting with this question on collections in Spring.NET , I discovered that Spring
While experimenting with pixel shaders in WPF I decided to draw some pixels onto
I am currently working through Michael Hartl's Rails Tutorial while experimenting with some other
I'm experimenting with a lexer, and I found that switching from a while-loop to
While experimenting with some stuff on the REPL, I got to a point where
While experimenting a bit with C++ templates I managed to produce this simple code,
While experimenting with iPhone app development, we have several AppIDs which should be deleted
I'm stuck while experimenting with captcha handler in asp.net, any help would be appreciated.
I ran into the following algorithmic problem while experimenting with classification algorithms. Elements are
i've started learning about javascript closures, and while experimenting, i realised that the following

Explore

  • Home
  • Add group
  • Groups page
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Users
  • Help
  • SEARCH

Footer

© 2021 The Archive Base. All Rights Reserved
With Love by The Archive Base

Insert/edit link

Enter the destination URL

Or link to existing content

    No search term specified. Showing recent items. Search or use up and down arrow keys to select an item.