I have a large set of data saved in a long list. This is

Question

0

Asked: June 11, 20262026-06-11T14:51:59+00:00 2026-06-11T14:51:59+00:00

I have a large set of data saved in a long list. This is

0

I have a large set of data saved in a long list. This is an example of the first six records:

A <- list(c("JAMES","CHARLES","JAMES","RICHARD"),  
c("JOHN","ROBERT","CHARLES"),  
c("CHARLES","WILLIAM","CHARLES","MICHAEL","WILLIAM","DAVID","CHARLES","WILLIAM"),  
c("CHARLES"),  
c("CHARLES","CHARLES"),  
c("MATTHEW","CHARLES","JACK"))

I would like to calculate the ratios of the sum of the relative frequency with which each unique term occurs in each record and the number of records each term appears in.

I calculate the numerator, i.e. the sum of the relative frequency with which each unique term occurs in each record, like this:

> B <- lapply(A, function(x)table(x)/length(x))  
> aggregate(unlist(B), list(names(unlist(B))), FUN=sum)  
Group.1         x  
1  CHARLES 3.2916667  
2    DAVID 0.1250000  
3     JACK 0.3333333  
4    JAMES 0.5000000  
5     JOHN 0.3333333  
6  MATTHEW 0.3333333  
7  MICHAEL 0.1250000  
8  RICHARD 0.2500000  
9   ROBERT 0.3333333  
10 WILLIAM 0.3750000

I’m not sure how to calculate the denominator, i.e. the number of records each term appears in, correctly, though. I only know how to calculate the number each term occurs in the data set:

> table(unlist(A))  

CHARLES   DAVID   JACK   JAMES    JOHN MATTHEW MICHAEL RICHARD  ROBERT WILLIAM  
   9       1       1       2       1       1       1       1       1       3

But some terms occur more than once within a record and I’d like to omit these repetitions in order to get a result like this:

CHARLES   DAVID   JACK   JAMES    JOHN MATTHEW MICHAEL RICHARD  ROBERT WILLIAM  
   6       1       1       1       1       1       1       1       1       1

How can this be achieved?
Based on my example I would like to get a final output similar to this:

Group.1         x  
1  CHARLES 0.5486111  
2    DAVID 0.1250000  
3     JACK 0.3333333  
4    JAMES 0.5000000  
5     JOHN 0.3333333  
6  MATTHEW 0.3333333  
7  MICHAEL 0.1250000  
8  RICHARD 0.2500000  
9   ROBERT 0.3333333  
10 WILLIAM 0.3750000

So how can I calculate the number of records each term appears in, i.e. the denominator, and the ratio itself?

Thank you very much in advance for your consideration!

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-06-11T14:52:01+00:00

Editorial Team

2026-06-11T14:52:01+00:00Added an answer on June 11, 2026 at 2:52 pm

When aggregating, instead of sum, just use mean:

aggregate(unlist(B), list(names(unlist(B))), FUN=mean)  
#    Group.1         x
# 1  CHARLES 0.5486111
# 2    DAVID 0.1250000
# 3     JACK 0.3333333
# 4    JAMES 0.5000000
# 5     JOHN 0.3333333
# 6  MATTHEW 0.3333333
# 7  MICHAEL 0.1250000
# 8  RICHARD 0.2500000
# 9   ROBERT 0.3333333
# 10 WILLIAM 0.3750000

0

Reply
Share
Share

- Report

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

I have a large set of data saved in a long list. This is

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply