Skip to content

v1.1.0 runtime for case_when with grouping variable is slow #6674

@fawda123

Description

@fawda123

Using case_when in a mutate call with a grouping variable is much, much slower in v1.1.0 compared to v1.0.10. The code works but it's causing a tremendous slowdown in many of the packages I maintain (see here, many examples have elapsed time >5s).

Here's a reprex for v1.1.0.

library(dplyr, warn.conflicts = F)
library(microbenchmark)

n <- 1000
dat <- data.frame(
    x = seq(1:n), 
    y = rnorm(n)
)

microbenchmark(
    dat %>% 
        group_by(x) %>% 
        mutate(
                 z = case_when(
                    y < 0 ~ '-',
                    T ~ '+', 
                 )
        ), 
    times = 100
)
#> Unit: seconds
#>                                                                        expr
#>  dat %>% group_by(x) %>% mutate(z = case_when(y < 0 ~ "-", T ~      "+", ))
#>       min       lq     mean   median       uq      max neval
#>  2.376748 2.537896 2.650869 2.625663 2.723655 3.170204   100

Created on 2023-02-01 with reprex v2.0.2

Session info
sessioninfo::session_info()
#> - Session info  --------------------------------------------------------------
#>  hash: person in steamy room: medium-dark skin tone, goat, black small square
#> 
#>  setting  value
#>  version  R version 4.1.3 (2022-03-10)
#>  os       Windows 10 x64 (build 22000)
#>  system   x86_64, mingw32
#>  ui       RTerm
#>  language (EN)
#>  collate  English_United States.1252
#>  ctype    English_United States.1252
#>  tz       America/New_York
#>  date     2023-02-01
#>  pandoc   2.19.2 @ C:/Program Files/RStudio/resources/app/bin/quarto/bin/tools/ (via rmarkdown)
#> 
#> - Packages -------------------------------------------------------------------
#>  package        * version date (UTC) lib source
#>  cli              3.6.0   2023-01-09 [1] CRAN (R 4.1.3)
#>  digest           0.6.31  2022-12-11 [1] CRAN (R 4.1.3)
#>  dplyr          * 1.1.0   2023-01-29 [1] CRAN (R 4.1.3)
#>  evaluate         0.20    2023-01-17 [1] CRAN (R 4.1.3)
#>  fansi            1.0.4   2023-01-22 [1] CRAN (R 4.1.3)
#>  fastmap          1.1.0   2021-01-25 [1] CRAN (R 4.1.2)
#>  fs               1.6.0   2023-01-23 [1] CRAN (R 4.1.3)
#>  generics         0.1.3   2022-07-05 [1] CRAN (R 4.1.3)
#>  glue             1.6.2   2022-02-24 [1] CRAN (R 4.1.3)
#>  htmltools        0.5.4   2022-12-07 [1] CRAN (R 4.1.3)
#>  knitr            1.42    2023-01-25 [1] CRAN (R 4.1.3)
#>  lifecycle        1.0.3   2022-10-07 [1] CRAN (R 4.1.3)
#>  magrittr         2.0.3   2022-03-30 [1] CRAN (R 4.1.3)
#>  microbenchmark * 1.4.9   2021-11-09 [1] CRAN (R 4.1.3)
#>  pillar           1.8.1   2022-08-19 [1] CRAN (R 4.1.3)
#>  pkgconfig        2.0.3   2019-09-22 [1] CRAN (R 4.1.2)
#>  purrr            1.0.1   2023-01-10 [1] CRAN (R 4.1.3)
#>  R.cache          0.15.0  2021-04-30 [1] CRAN (R 4.1.3)
#>  R.methodsS3      1.8.1   2020-08-26 [1] CRAN (R 4.1.1)
#>  R.oo             1.24.0  2020-08-26 [1] CRAN (R 4.1.1)
#>  R.utils          2.11.0  2021-09-26 [1] CRAN (R 4.1.3)
#>  R6               2.5.1   2021-08-19 [1] CRAN (R 4.1.2)
#>  reprex           2.0.2   2022-08-17 [1] CRAN (R 4.1.3)
#>  rlang            1.0.6   2022-09-24 [1] CRAN (R 4.1.3)
#>  rmarkdown        2.20    2023-01-19 [1] CRAN (R 4.1.3)
#>  rstudioapi       0.13    2020-11-12 [1] CRAN (R 4.1.2)
#>  sessioninfo      1.2.1   2021-11-02 [1] CRAN (R 4.1.2)
#>  styler           1.7.0   2022-03-13 [1] CRAN (R 4.1.3)
#>  tibble           3.1.8   2022-07-22 [1] CRAN (R 4.1.3)
#>  tidyselect       1.2.0   2022-10-10 [1] CRAN (R 4.1.3)
#>  utf8             1.2.2   2021-07-24 [1] CRAN (R 4.1.2)
#>  vctrs            0.5.2   2023-01-23 [1] CRAN (R 4.1.3)
#>  withr            2.5.0   2022-03-03 [1] CRAN (R 4.1.3)
#>  xfun             0.36    2022-12-21 [1] CRAN (R 4.1.3)
#>  yaml             2.3.7   2023-01-23 [1] CRAN (R 4.1.3)
#> 
#>  [1] C:/Users/mbeck/R/win-library
#>  [2] C:/Program Files/R/R-4.1.3/library
#> 
#> ------------------------------------------------------------------------------

And here's a reprex for v1.0.10 (note that the times for this one are in milliseconds, above was seconds).

library(dplyr, warn.conflicts = F)
library(microbenchmark)

n <- 1000
dat <- data.frame(
    x = seq(1:n), 
    y = rnorm(n)
)

microbenchmark(
    dat %>% 
        group_by(x) %>% 
        mutate(
                 z = case_when(
                    y < 0 ~ '-',
                    T ~ '+', 
                 )
        ), 
    times = 100
)
#> Unit: milliseconds
#>                                                                        expr
#>  dat %>% group_by(x) %>% mutate(z = case_when(y < 0 ~ "-", T ~      "+", ))
#>       min       lq     mean  median       uq      max neval
#>  114.9103 120.9102 126.9423 123.889 128.7439 167.7735   100

Created on 2023-02-01 with reprex v2.0.2

Session info
sessioninfo::session_info()
#> - Session info  --------------------------------------------------------------
#>  hash: open mailbox with raised flag, love-you gesture: medium skin tone, snowboarder: light skin tone
#> 
#>  setting  value
#>  version  R version 4.1.3 (2022-03-10)
#>  os       Windows 10 x64 (build 22000)
#>  system   x86_64, mingw32
#>  ui       RTerm
#>  language (EN)
#>  collate  English_United States.1252
#>  ctype    English_United States.1252
#>  tz       America/New_York
#>  date     2023-02-01
#>  pandoc   2.19.2 @ C:/Program Files/RStudio/resources/app/bin/quarto/bin/tools/ (via rmarkdown)
#> 
#> - Packages -------------------------------------------------------------------
#>  package        * version date (UTC) lib source
#>  assertthat       0.2.1   2019-03-21 [1] CRAN (R 4.1.2)
#>  cli              3.6.0   2023-01-09 [1] CRAN (R 4.1.3)
#>  DBI              1.1.3   2022-06-18 [1] CRAN (R 4.1.3)
#>  digest           0.6.31  2022-12-11 [1] CRAN (R 4.1.3)
#>  dplyr          * 1.0.10  2022-09-01 [1] CRAN (R 4.1.3)
#>  evaluate         0.20    2023-01-17 [1] CRAN (R 4.1.3)
#>  fansi            1.0.4   2023-01-22 [1] CRAN (R 4.1.3)
#>  fastmap          1.1.0   2021-01-25 [1] CRAN (R 4.1.2)
#>  fs               1.6.0   2023-01-23 [1] CRAN (R 4.1.3)
#>  generics         0.1.3   2022-07-05 [1] CRAN (R 4.1.3)
#>  glue             1.6.2   2022-02-24 [1] CRAN (R 4.1.3)
#>  htmltools        0.5.4   2022-12-07 [1] CRAN (R 4.1.3)
#>  knitr            1.42    2023-01-25 [1] CRAN (R 4.1.3)
#>  lifecycle        1.0.3   2022-10-07 [1] CRAN (R 4.1.3)
#>  magrittr         2.0.3   2022-03-30 [1] CRAN (R 4.1.3)
#>  microbenchmark * 1.4.9   2021-11-09 [1] CRAN (R 4.1.3)
#>  pillar           1.8.1   2022-08-19 [1] CRAN (R 4.1.3)
#>  pkgconfig        2.0.3   2019-09-22 [1] CRAN (R 4.1.2)
#>  purrr            1.0.1   2023-01-10 [1] CRAN (R 4.1.3)
#>  R.cache          0.15.0  2021-04-30 [1] CRAN (R 4.1.3)
#>  R.methodsS3      1.8.1   2020-08-26 [1] CRAN (R 4.1.1)
#>  R.oo             1.24.0  2020-08-26 [1] CRAN (R 4.1.1)
#>  R.utils          2.11.0  2021-09-26 [1] CRAN (R 4.1.3)
#>  R6               2.5.1   2021-08-19 [1] CRAN (R 4.1.2)
#>  reprex           2.0.2   2022-08-17 [1] CRAN (R 4.1.3)
#>  rlang            1.0.6   2022-09-24 [1] CRAN (R 4.1.3)
#>  rmarkdown        2.20    2023-01-19 [1] CRAN (R 4.1.3)
#>  rstudioapi       0.13    2020-11-12 [1] CRAN (R 4.1.2)
#>  sessioninfo      1.2.1   2021-11-02 [1] CRAN (R 4.1.2)
#>  styler           1.7.0   2022-03-13 [1] CRAN (R 4.1.3)
#>  tibble           3.1.8   2022-07-22 [1] CRAN (R 4.1.3)
#>  tidyselect       1.2.0   2022-10-10 [1] CRAN (R 4.1.3)
#>  utf8             1.2.2   2021-07-24 [1] CRAN (R 4.1.2)
#>  vctrs            0.5.2   2023-01-23 [1] CRAN (R 4.1.3)
#>  withr            2.5.0   2022-03-03 [1] CRAN (R 4.1.3)
#>  xfun             0.36    2022-12-21 [1] CRAN (R 4.1.3)
#>  yaml             2.3.7   2023-01-23 [1] CRAN (R 4.1.3)
#> 
#>  [1] C:/Users/mbeck/R/win-library
#>  [2] C:/Program Files/R/R-4.1.3/library
#> 
#> ------------------------------------------------------------------------------

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions