I am trying to exclude some rows from a datatable based on, let’s say, days and month – excluding for example summer holidays, that always begin for example 15th of June and end the 15th of next month. I can extract those days based on Date, but as as.Date function is awfully slow to operate with, I have separate integer columns for Month and Day and I want to do it using only them.
It is easy to select the given entries by
DT[Month==6][Day>=15]
DT[Month==7][Day<=15]
Is there any way how to make “difference” of the two data.tables (the original ones and the ones I selected). (Why not subset? Maybe I am missing something simple, but I don’t want to exclude days like 10/6, 31/7.)
I am aware of a way to do it with join, but only day by day
setkey(DT, Month, Day)
DT[-DT[J(Month,Day), which= TRUE]]
Can anyone help how to solve it in more general way?
Great question. I’ve edited the question title to match the question.
A simple approach avoiding
as.Datewhich reads nicely :That’s probably fast enough in many cases. If you have a lot of different ranges, then you may want to step up a gear :
That’s a bit long and error prone because it’s DIY. So one idea is that a
listcolumn in anitable would represent a range query (FR#203, like a binary search%between%). Then a not-join (also not yet implemented, FR#1384) could be combined with the list column range query to do exactly what you asked :That would extend to multiple different ranges, or the same range for many different ids, in the usual way; i.e., more rows added to
i.