Sign Up

Sign Up to our social questions and Answers Engine to ask questions, answer people’s questions, and connect with other people.

Have an account? Sign In

Have an account? Sign In Now

Sign In

Login to our social questions & Answers Engine to ask questions answer people’s questions & connect with other people.

Sign Up Here

Forgot Password?

Don't have account, Sign Up Here

Forgot Password

Lost your password? Please enter your email address. You will receive a link and will create a new password via email.

Have an account? Sign In Now

You must login to ask a question.

Forgot Password?

Need An Account, Sign Up Here

Please briefly explain why you feel this question should be reported.

Please briefly explain why you feel this answer should be reported.

Please briefly explain why you feel this user should be reported.

Sign InSign Up

The Archive Base

The Archive Base Logo The Archive Base Logo

The Archive Base Navigation

  • SEARCH
  • Home
  • About Us
  • Blog
  • Contact Us
Search
Ask A Question

Mobile menu

Close
Ask a Question
  • Home
  • Add group
  • Groups page
  • Feed
  • User Profile
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Buy Points
  • Users
  • Help
  • Buy Theme
  • SEARCH
Home/ Questions/Q 6934505
In Process

The Archive Base Latest Questions

Editorial Team
  • 0
Editorial Team
Asked: May 27, 20262026-05-27T11:58:07+00:00 2026-05-27T11:58:07+00:00

I have a large data set dim(dt) [1] 422096 162 where dt is a

  • 0

I have a large data set

 dim(dt)
 [1] 422096    162

where dt is a data.table with a key of tic. I am trying to make a measure for each group of how many missing entries I have. The groups are time series, and dt contains a date column, which is an R date, and a book_lev column, my variable of interest.

This is my code so far:

dt <- dt[sumdt]
sumdt <- dt[ ,list(min.date=min(date), max.date=max(date)), by="tic"]

sublengths <- dt[,list(tslen=length(date)),by=tic, mult="last"]
bt2 <- dt[sublengths, mult="first"]
bt2[, max.year:=extractyear(max.date)]
bt2[, min.year:=extractyear(min.date)]
bt2[, data.fullness:=tslen/(max.year - min.year + 1)]

dt <- dt[bt2]

My idea was that I create this data.fullness value which should equal 1 if there are no holes in the time series. I realize that I may have some NA’s in my book_lev column, so I would like to further restrict. Also, in general I am new to data.tables and I would like to see if there are better ways to write what I have just written.

A small sample of the data, which you can load using R’s load command, is available here: http://econsteve.com/r/dt_sample.Robj

  • 1 1 Answer
  • 0 Views
  • 0 Followers
  • 0
Share
  • Facebook
  • Report

Leave an answer
Cancel reply

You must login to add an answer.

Forgot Password?

Need An Account, Sign Up Here

1 Answer

  • Voted
  • Oldest
  • Recent
  • Random
  1. Editorial Team
    Editorial Team
    2026-05-27T11:58:08+00:00Added an answer on May 27, 2026 at 11:58 am

    (First, a caveat. I’m not sure I correctly understood what you want your data.fullness variable to summarize. Based on the dataset you’ve linked to, I’m taking it to be the proportion of years with some data, in the interval from the first measured year to the last measured year.)

    Here is the approach I’d take to the problem as I do understand it:

    ## FIRST, DEFINE A COUPLE OF FUNCTIONS
    
    extractYear <- function(X) {
        as.numeric(format(as.Date(X, format="%m/%d/%Y"), "%Y"))
    }
    
    calcFullness <- function(YRS) {
        length(unique(YRS))/(diff(range(YRS))+1)
    }
    
    ## THEN SET TO WORK ON YOUR DATA.TABLE
    
    key(dt) <- "tic"
    dt[, year:=extractYear(datadate)]
    
    # Extract summaries for each level of tic
    ticSumm <- 
        dt[, list(min.year = min(year),
                  max.year = max(year),
                  data.fullness = calcFullness(year)), by=tic]
    ticSumm
    #       tic min.year max.year data.fullness
    # [1,] AMZN     1995     2010             1
    # [2,]   GM     1950     2010             1
    # [3,]  XOM     1950     2010             1
    
    
    # Merge summary back into dt
    dt <- dt[ticSumm]
    
    • 0
    • Reply
    • Share
      Share
      • Share on Facebook
      • Share on Twitter
      • Share on LinkedIn
      • Share on WhatsApp
      • Report

Sidebar

Related Questions

I have a very large possible data set that I am trying to visualize
I have a large data set that I'm working with in excel. About 1000+
Interpolating Large Datasets I have a large data set of about 0.5million records representing
I have a large real 1-d data set called r. I would like plot:
I have a large set of data (a data cube of 250,000 X 1,000
I have a large set of data which I access via a generator/iterator. While
I have implemented paging for a large set of data in an application by
I have a costumer showing Notepad with a large set of data that looks
I have situation where a user can manipulate a large set of data (presented
I have large data set, which I want to query. The query does not

Explore

  • Home
  • Add group
  • Groups page
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Users
  • Help
  • SEARCH

Footer

© 2021 The Archive Base. All Rights Reserved
With Love by The Archive Base

Insert/edit link

Enter the destination URL

Or link to existing content

    No search term specified. Showing recent items. Search or use up and down arrow keys to select an item.