I have a fairly basic data.table in R, with 250k rows and 90 columns. I am trying to key the data.table on one of the columns which is of class character. When I call:
setkey(my.dt,my.column)
I receive the following cryptic error message:
"Error in setkeyv(x, cols, verbose=verbose) :
reorder received irregular lengthed list"
I have found a source-code commit with this message, but can’t quite decipher what it means. My key column contains no NA or blank values, seems perfectly reasonable to look at (it contains stock tickers), and behaves well with the default order() command.
Even more frustrating, the following code completes correctly:
first.dt <- my.dt[1:100000]
setkey(first.dt,my.column)
second.dt <- my.dt[100001:nrow(my.dt]
setkey(second.dt,my.column)
I have no idea what could be going on here. Any tips?
Edit 1: I have confirmed every value in the key fits a fairly standard format:
> length(grep("[A-Z]{3,4}\\.[A-Z]{2}",my.dt$my.column)) == nrow(my.dt)
[1] TRUE
Edit 2: My system info is below (note that I’m actually using Windows 7). I am using data.table version 1.8.
> Sys.info()
sysname release version nodename machine login
"Windows" "Server 2008 x64" "build 7600" "WIN-9RH28AH0CKG" "x86-64" "Administrator"
user effective_user
"Administrator" "Administrator"
Please run :
I suspect that one or more columns have a different length to the first column, and that’s an invalid
data.table. It won’t be one of the first 5 because your.Internal(inspect(my.dt))(thanks) shows those and they’re ok.If so, there is this bug fix in v1.8.1 :
Any chance there’s an
rbind()at an earlier point to createmy.dttogether with an irregular lengthedlist? If not, please step through your code running thesapply(my.dt,length)to see where the invalidly lengthed column is being created. Armed with that we can make a work around and also fix the potential bug. Thanks.EDIT :
The original cryptic error message is now improved in v1.8.1, as follows :
NB: This method to create a
data.tableis not recommended because it lets you create an invaliddata.table. Unless, you are really sure thelistis regular and you really do need speed (i.e. for speed you want to avoid the checks thatas.data.table()anddata.table()do), or you need to demonstrate an invaliddata.table, as I’m doing here.