I have a data.table which has two keys: Year (10 levels) and MemberID (200,000 levels). When I setkey, does setkey(MemberID, Year) result in different performance compare with setkey(Year, MemberID)? If so, which way will be better?
I have a data.table which has two keys: Year (10 levels) and MemberID (200,000
Share
The performance and speed of the key setting will depend on the key variable types.
numericcolumns will be slower thaninteger.charactercolumns (when short strings) appear to be fast.eg
Some timings
As to which way will be better. This will probably depend on what you are most likely to subset by alone more often.
For example, you may need to get all the data for year 1.
If you have set the key as
year, idthen you can usebut if the key was set as
id, yearthen you would needwhich is more typing and will take longer as it has to calculate
unique(id).There is a feature request FR#1007 that looks at allowing a secondary key, but this is not implemented yet. Currently there is a single key that can occupy more than one column.