The SO question that triggered the thought:
Instead of having to do:
DT[, length(unique(.)), by=.]
We could do with:
This'll especially be faster for data.tables though because we don't have to subset the entire data.table to know the number of unique values.
Here's a quick benchmark:
require(data.table)
x = sample(1e2, 1e7, TRUE)
system.time(ans1 <- length(unique(x))) # 0.667 seconds
system.time(ans2 <- length(attr(data.table:::forderv(x, retGrp=TRUE), 'starts'))) # 0.1 seconds
We could, in addition, also internally optimise length(unique(.)) to n_unique(.).