-
Notifications
You must be signed in to change notification settings - Fork 2.1k
Closed
Labels
Milestone
Description
As of dplyr 1.0.0, summarize() will create multiple rows per group, according to the length of the return value of the summary function. This new feature leads to unintended behavior if the vector return is accidental, and also can lead to data loss.
library(conflicted)
library(dplyr)
my_custom_summary_function <- function(n) {
# Should return a scalar, but I accidentally return a vector
rep(n, n)
}
tibble(n = 2:0) %>%
group_by(n) %>%
summarize(out = my_custom_summary_function(n), .groups = "drop") %>%
ungroup()
#> # A tibble: 3 × 2
#> n out
#> <int> <int>
#> 1 1 1
#> 2 2 2
#> 3 2 2Created on 2022-08-01 by the reprex package (v2.0.1)
Should we introduce a .multi = c("allow", "require", "fail") argument that supports the pre-1.0.0 strict mode of operation? Should .multi = "fail" even be the default?
library(conflicted)
library(dplyr)
my_custom_summary_function <- function(n) {
# Should return a scalar, but I accidentally return a vector
rep(n, n)
}
tibble(n = 2:0) %>%
group_by(n) %>%
summarize(out = my_custom_summary_function(n), .groups = "drop", .multi = "fail") %>%
ungroup()
## Error: `out` has length != 1 in groups 1, 3, use `.multi = "allow"` if this is intendedImagined on 2022-08-01 by the reprex package (v2.0.1)