-
Notifications
You must be signed in to change notification settings - Fork 1k
Labels
Milestone
Description
(This is a cleanup and improvement of some of the #4589 discussion.)
Take this code:
> d1 <- data.frame(a=c(1,2,3,4,5), b=c(2,3,4,5,6))
> d2 <- d1
> setDT(d2) # At this point d2 is a shallow copy of d1, pointing to the same columns
Do modifications to d2 impact d1? We could live with both 'yes' or 'no', but the answer is sometimes:
d2[, b:=3:7] # (1) impacts only d2
d2[, c:=4:8] # (2) impacts only d2
d2[!is.na(a), b:=5:9] # (3) impacts both
d2[, b:=30] # (4) impacts both
In cases 1&2 d2 'plunks' the full columns into itself and d1 isn't affected. In cases 3 & 4 it seems that operation-in-place optimization kicks in (address(d2$b) is unchanged), so there is no copy-on-write and data still pointed to by d1 is overwritten.
These semantic discrepancies make (the otherwise great) setDT unusable to us except in the most trivial scripts.
# Output of sessionInfo()
R version 3.6.1 (2019-07-05)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 10 x64 (build 19041)
Matrix products: default
> packageVersion("data.table")
[1] ‘1.13.0’