I want to determine the column classes of a large data.table.
colClasses <- sapply(DT, FUN=function(x)class(x)[1])
works, but apparently local copies are stored into memory:
> memory.size()
[1] 687.59
> colClasses <- sapply(DT, class)
> memory.size()
[1] 1346.21
A loop seems not possible, because a data.table “with=FALSE” always results in a data.table.
A quick and very dirty method is:
DT1 <- DT[1, ]
colClasses <- sapply(DT1, FUN=function(x)class(x)[1])
What is the most elegent and efficient way to do this?
Have briefly investigated, and it looks like a
data.tablebug.So, looking at
as.list.data.table:Note the pesky
unclasson the first line.?unclassconfirms that it takes a deep copy of its argument. From this quick look it doesn’t seem likesapplyorlapplyare doing the copying (I didn’t think they did since R is good at copy-on-write, and those aren’t writing), but rather theas.listinlapply(which dispatches toas.list.data.table).So, if we avoid the
unclass, it should speed up. Let’s try:So, yes, infinitely better.
I’ve raised bug report #2000 to remove the
as.list.data.tablemethod, since adata.tableis()already alist, too. This might speed up quite a few idioms actually, such aslapply(.SD,...). [EDIT: This was fixed in v1.8.1].Thanks for asking this question!!