Sign Up

Sign Up to our social questions and Answers Engine to ask questions, answer people’s questions, and connect with other people.

Have an account? Sign In

Have an account? Sign In Now

Sign In

Login to our social questions & Answers Engine to ask questions answer people’s questions & connect with other people.

Sign Up Here

Forgot Password?

Don't have account, Sign Up Here

Forgot Password

Lost your password? Please enter your email address. You will receive a link and will create a new password via email.

Have an account? Sign In Now

You must login to ask a question.

Forgot Password?

Need An Account, Sign Up Here

Please briefly explain why you feel this question should be reported.

Please briefly explain why you feel this answer should be reported.

Please briefly explain why you feel this user should be reported.

Sign InSign Up

The Archive Base

The Archive Base Logo The Archive Base Logo

The Archive Base Navigation

  • SEARCH
  • Home
  • About Us
  • Blog
  • Contact Us
Search
Ask A Question

Mobile menu

Close
Ask a Question
  • Home
  • Add group
  • Groups page
  • Feed
  • User Profile
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Buy Points
  • Users
  • Help
  • Buy Theme
  • SEARCH
Home/ Questions/Q 8554367
In Process

The Archive Base Latest Questions

Editorial Team
  • 0
Editorial Team
Asked: June 11, 20262026-06-11T14:51:59+00:00 2026-06-11T14:51:59+00:00

I have a large set of data saved in a long list. This is

  • 0

I have a large set of data saved in a long list. This is an example of the first six records:

A <- list(c("JAMES","CHARLES","JAMES","RICHARD"),  
c("JOHN","ROBERT","CHARLES"),  
c("CHARLES","WILLIAM","CHARLES","MICHAEL","WILLIAM","DAVID","CHARLES","WILLIAM"),  
c("CHARLES"),  
c("CHARLES","CHARLES"),  
c("MATTHEW","CHARLES","JACK"))  

I would like to calculate the ratios of the sum of the relative frequency with which each unique term occurs in each record and the number of records each term appears in.

I calculate the numerator, i.e. the sum of the relative frequency with which each unique term occurs in each record, like this:

> B <- lapply(A, function(x)table(x)/length(x))  
> aggregate(unlist(B), list(names(unlist(B))), FUN=sum)  
Group.1         x  
1  CHARLES 3.2916667  
2    DAVID 0.1250000  
3     JACK 0.3333333  
4    JAMES 0.5000000  
5     JOHN 0.3333333  
6  MATTHEW 0.3333333  
7  MICHAEL 0.1250000  
8  RICHARD 0.2500000  
9   ROBERT 0.3333333  
10 WILLIAM 0.3750000  

I’m not sure how to calculate the denominator, i.e. the number of records each term appears in, correctly, though. I only know how to calculate the number each term occurs in the data set:

> table(unlist(A))  

CHARLES   DAVID   JACK   JAMES    JOHN MATTHEW MICHAEL RICHARD  ROBERT WILLIAM  
   9       1       1       2       1       1       1       1       1       3  

But some terms occur more than once within a record and I’d like to omit these repetitions in order to get a result like this:

CHARLES   DAVID   JACK   JAMES    JOHN MATTHEW MICHAEL RICHARD  ROBERT WILLIAM  
   6       1       1       1       1       1       1       1       1       1  

How can this be achieved?
Based on my example I would like to get a final output similar to this:

Group.1         x  
1  CHARLES 0.5486111  
2    DAVID 0.1250000  
3     JACK 0.3333333  
4    JAMES 0.5000000  
5     JOHN 0.3333333  
6  MATTHEW 0.3333333  
7  MICHAEL 0.1250000  
8  RICHARD 0.2500000  
9   ROBERT 0.3333333  
10 WILLIAM 0.3750000  

So how can I calculate the number of records each term appears in, i.e. the denominator, and the ratio itself?

Thank you very much in advance for your consideration!

  • 1 1 Answer
  • 0 Views
  • 0 Followers
  • 0
Share
  • Facebook
  • Report

Leave an answer
Cancel reply

You must login to add an answer.

Forgot Password?

Need An Account, Sign Up Here

1 Answer

  • Voted
  • Oldest
  • Recent
  • Random
  1. Editorial Team
    Editorial Team
    2026-06-11T14:52:01+00:00Added an answer on June 11, 2026 at 2:52 pm

    When aggregating, instead of sum, just use mean:

    aggregate(unlist(B), list(names(unlist(B))), FUN=mean)  
    #    Group.1         x
    # 1  CHARLES 0.5486111
    # 2    DAVID 0.1250000
    # 3     JACK 0.3333333
    # 4    JAMES 0.5000000
    # 5     JOHN 0.3333333
    # 6  MATTHEW 0.3333333
    # 7  MICHAEL 0.1250000
    # 8  RICHARD 0.2500000
    # 9   ROBERT 0.3333333
    # 10 WILLIAM 0.3750000
    
    • 0
    • Reply
    • Share
      Share
      • Share on Facebook
      • Share on Twitter
      • Share on LinkedIn
      • Share on WhatsApp
      • Report

Sidebar

Related Questions

Interpolating Large Datasets I have a large data set of about 0.5million records representing
I have a large data set of nucleotide sequences (long strings simply put) which
I have fairly large set of data returned via AJAX from a page. This
I have a large set of data that is generated from a web service
I have a large set of data which I access via a generator/iterator. While
Basically I have a large set of data in excel, and I was wondering
I have a large data set and I want to write a custom merge
I have a large data set that I'm working with in excel. About 1000+
I have implemented paging for a large set of data in an application by
I have situation where a user can manipulate a large set of data (presented

Explore

  • Home
  • Add group
  • Groups page
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Users
  • Help
  • SEARCH

Footer

© 2021 The Archive Base. All Rights Reserved
With Love by The Archive Base

Insert/edit link

Enter the destination URL

Or link to existing content

    No search term specified. Showing recent items. Search or use up and down arrow keys to select an item.