Skip to content

shift() behaviour with missing periods #1530

@pstoyanov

Description

@pstoyanov

Hi,
I am not sure whether to formulate this as a question, a suggestion for warning in the docs or a feature request...

The way in which shift() currently works (with irregular intervals) is like this:

> DT <- data.table(time = 1:5, value = c("A", "B", "C", "D", "E"))[!3] # make irregular by dropping one row
> setkey(DT, time)
> DT
   time value
1:    1     A
2:    2     B
3:    4     D
4:    5     E
> DT[, lag_value := shift(x = value, n = 1, fill = NA, type = "lag")]
> DT
   time value lag_value
1:    1     A        NA
2:    2     B         A
3:    4     D         B
4:    5     E         D

My issue is the lagged value for time == 4. I'd expect behaviour like this if I specify rolling explicitly, but not in lagging. The documentation for shift specifically refers to the n argument as "...periods to lead/lag by..", and with n = 1 the value maybe should not silently be lagged 2 periods. lag {stats} also specifies their argument k as "The number of lags (in units of observations)"

What the user (if they tend to think like me, that is) probably meant was:

> DT[.(c(min(time):max(time))), lag_value := shift(x = value, n = 1, fill = NA, type = "lag"), nomatch = NA]
> DT
   time value lag_value
1:    1     A        NA
2:    2     B         A
3:    4     D        NA
4:    5     E         D

or even

> DT[CJ(c(min(time):max(time))), j = .(time, value, lag_value = shift(x = value, n = 1, fill = NA, type = "lag")), nomatch = NA]
   time value lag_value
1:    1     A        NA
2:    2     B         A
3:    3    NA         B
4:    4     D        NA
5:    5     E         D

I understand this is an issue only for ordered, regular, time-like cases but still maybe worth mentioning in the documentation, or adding a warning?
Or modifying the fill = argument (which currently is more like pad_by =) to provide the suggested behaviour above?

Thanks!

Metadata

Metadata

Type

No type

Projects

No projects

Milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions