Skip to content

[R-Forge #5222] 'not found' when DT[, list(sum(non-.SD-col), lapply(.SD,mean)), by=..., .SDcols=...] #495

@arunsrinivasan

Description

@arunsrinivasan

Submitted by: Matt Weller; Assigned to: Nobody; R-Forge link

When using .SDcols (for the purpose of applying a function to multiple columns) I cannot reference other columns in the original table (v1) using the following syntax:

dt = data.table(grp=c(2,3,3,1,1,2,3), v1=1:7, v2=7:1, v3=10:16)
dt.out = dt[, c(v1 = sum(v1),  lapply(.SD,mean)), by = grp, .SDcols = v2:v3]
# Error in `[.data.table`(dt, , list(v1 = sum(v1), lapply(.SD, mean)), by = grp,  : 
#   object 'v1' not found

A similar error happens when I use c instead of list, clearly the column v1 cannot be accessed within the j clause.

I resorted to the following code which includes column v1, even though I do not want that to be included in the lapply portion, having to drop it after computation.

sd.cols = c("v1","v2", "v3")
dt.out = dt[, c(sum.v1 = sum(v1), lapply(.SD,mean)), by = grp, .SDcols = sd.cols]

According to eddi on Stackoverflow this is a bug and he has asked me to report it. I cannot provide much more detail as I'm not exactly sure which part he thinks was a bug, looking at the accepted answer by Arun and their ensuing discussion will highlight where but the problem lies.

Here is the relevant SO post.

Metadata

Metadata

Labels

Type

No type

Projects

No projects

Milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions