-
Notifications
You must be signed in to change notification settings - Fork 1k
Description
Hi,
I am not sure whether to formulate this as a question, a suggestion for warning in the docs or a feature request...
The way in which shift() currently works (with irregular intervals) is like this:
> DT <- data.table(time = 1:5, value = c("A", "B", "C", "D", "E"))[!3] # make irregular by dropping one row
> setkey(DT, time)
> DT
time value
1: 1 A
2: 2 B
3: 4 D
4: 5 E
> DT[, lag_value := shift(x = value, n = 1, fill = NA, type = "lag")]
> DT
time value lag_value
1: 1 A NA
2: 2 B A
3: 4 D B
4: 5 E D
My issue is the lagged value for time == 4. I'd expect behaviour like this if I specify rolling explicitly, but not in lagging. The documentation for shift specifically refers to the n argument as "...periods to lead/lag by..", and with n = 1 the value maybe should not silently be lagged 2 periods. lag {stats} also specifies their argument k as "The number of lags (in units of observations)"
What the user (if they tend to think like me, that is) probably meant was:
> DT[.(c(min(time):max(time))), lag_value := shift(x = value, n = 1, fill = NA, type = "lag"), nomatch = NA]
> DT
time value lag_value
1: 1 A NA
2: 2 B A
3: 4 D NA
4: 5 E D
or even
> DT[CJ(c(min(time):max(time))), j = .(time, value, lag_value = shift(x = value, n = 1, fill = NA, type = "lag")), nomatch = NA]
time value lag_value
1: 1 A NA
2: 2 B A
3: 3 NA B
4: 4 D NA
5: 5 E D
I understand this is an issue only for ordered, regular, time-like cases but still maybe worth mentioning in the documentation, or adding a warning?
Or modifying the fill = argument (which currently is more like pad_by =) to provide the suggested behaviour above?
Thanks!