I have a data.table DT and I want to run model.matrix on it. Each

Question

0

Asked: June 16, 20262026-06-16T09:42:29+00:00 2026-06-16T09:42:29+00:00

I have a data.table DT and I want to run model.matrix on it. Each

0

I have a data.table DT and I want to run model.matrix on it. Each row has a string ID, which is stored in the ID column of DT. When I run model.matrix on DT, my formula excludes the ID column. The problem is, model.matrix drops some rows because of NAs. If I set the rownames of DT to the ID column, before calling model.matrix, then the final model matrix has rownames, and I’m all set. Otherwise, I can’t figure out what rows I end up with. I’m setting the rownames with rownames(DT) = DT$ID. However, when I try to add a new column to DT, I get a complaint about

“Invalid .internal.selfref detected . . . At an earlier point, this
data.table has been copied by R.”

So I’m wondering

Is there a better way to set rownames for a data.table
Is there a better approach to solving this problem.

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-06-16T09:42:30+00:00

There are a couple of issues here.

Firstly, it is a feature of a data.table that they do not have a rownames, instead they have keys which are far more powerful. See this great vignette.

But, it isn’t the end of the world. model.matrix returns sensible rownames when you pass it a data.table

For example

A <- data.table(ID = 1:5, x = c(NA, 1:4), y = c(4:2,NA,3))

mm <- model.matrix( ~ x + y, A)

rownames(mm)

## [1] "2" "3" "5"

So rows 2,3 and 5 are those included in the model.matrix.

Now, you can add this sequence as a column to A. This will be useful if you then set the key to something else (thereby losing the original order)

A[, rowid := seq_len(nrow(A)]

You might consider making it character (like the rownames of mm)) but it won’t really matter (as you can just as easily convert rownames(mm) to numeric when you need to reference.

As to the warning that data.table gives, if you read the next sentence

Avoid key<-, names<- and attr<- which in R currently (and oddly) may copy the whole data.table. Use set* syntax instead to avoid copying: setkey(), setnames() and setattr()

rownames are an attribute rownames<- (internally at somepoint using the equivalent to attr<-) will (possibly copy) in the same way.

The line from `row.names<-.data.frame` is

attr(x, "row.names") <- value

That being said, data.tables don’t have rownames, so there is no point setting them.

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

I have a data.table DT and I want to run model.matrix on it. Each

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply