Skip to content

Conversation

@romainfrancois
Copy link
Member

closes #5901

library(dplyr, warn.conflicts = FALSE)

f <- function(x) { print(x, n = nrow(x) )}
bla <- tibble(i = 1:2, x = list(tibble(a = 1), tibble(a = 2))) %>% 
  rowwise() %>% 
  summarise(f(cur_data_all()))
#> # A tibble: 1 x 2
#>       i x               
#>   <int> <list>          
#> 1     1 <tibble [1 × 1]>
#> # A tibble: 1 x 2
#>       i x               
#>   <int> <list>          
#> 1     2 <tibble [1 × 1]>

Created on 2021-09-17 by the reprex package (v2.0.0)

@romainfrancois romainfrancois changed the title cur_data() / cur_data_all() not simplifying list columns cur_data() / cur_data_all() not simplifying list columns Sep 17, 2021
@romainfrancois
Copy link
Member Author

However, the data is still simplified as usual when used directly in the expression, or through across():

library(dplyr, warn.conflicts = FALSE)
library(tibble)

df <- tibble(x = list(tibble(a = 1), tibble(a = 2))) %>%
  rowwise()

summarise(df, is_tibble(x))
#> # A tibble: 2 x 1
#>   `is_tibble(x)`
#>   <lgl>         
#> 1 TRUE          
#> 2 TRUE
summarise(df, across(x, is_tibble))
#> # A tibble: 2 x 1
#>   x    
#>   <lgl>
#> 1 TRUE 
#> 2 TRUE

Created on 2021-09-17 by the reprex package (v2.0.0)


pick = function(vars) {
cols <- self$current_cols(vars)
if (inherits(private$data, "rowwise_df")) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you explain why we have to re-wrap in a list? (As opposed to not unwrapping somewhere?)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It happens in this line:

if (rowwise && vctrs::vec_is_list(column) && Rf_length(column) > 0) {

void dplyr_lazy_vec_chop_grouped(SEXP chops_env, SEXP rows, SEXP data, bool rowwise) {
  SEXP names = PROTECT(Rf_getAttrib(data, R_NamesSymbol));
  R_xlen_t n = XLENGTH(data);

  const SEXP* p_data = VECTOR_PTR_RO(data);
  const SEXP* p_names = STRING_PTR_RO(names);
  for (R_xlen_t i = 0; i < n; i++) {
    SEXP prom = PROTECT(Rf_allocSExp(PROMSXP));
    SET_PRENV(prom, R_EmptyEnv);
    SEXP column = p_data[i];
    if (rowwise && vctrs::vec_is_list(column) && Rf_length(column) > 0) {
      SET_PRCODE(prom, column);
    } else {
      SET_PRCODE(prom, Rf_lang3(dplyr::functions::vec_chop, column, rows));
    }
    SET_PRVALUE(prom, R_UnboundValue);

    Rf_defineVar(rlang::str_as_symbol(p_names[i]), prom, chops_env);
    UNPROTECT(1);
  }

  UNPROTECT(1);
}

When we setup the promise that makes the chops, and this is a rowwise data, we can skip using vec_chop() and simply use the list columns as the chops.

This gives us the rowwise simplification for when we refer to the list column in an expression (not having to [[1]]).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

... and for non-original columns, i.e. those that are added, they go through the <DataMask>$add_one() method:

add_one = function(name, chunks, result) {
      if (inherits(private$data, "rowwise_df")){
        is_scalar_list <- function(.x) {
          vec_is_list(.x) && length(.x) == 1L
        }
        if (all(map_lgl(chunks, is_scalar_list))) {
          chunks <- map(chunks, `[[`, 1L)
        }
      }

      .Call(`dplyr_mask_add`, private, name, result, chunks)
    },

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah makes sense. Thanks!

@romainfrancois romainfrancois merged commit f21f007 into master Sep 21, 2021
@romainfrancois romainfrancois deleted the 5901_cur_data_rowwise branch September 21, 2021 07:47
@romainfrancois romainfrancois added this to the 1.0.8 milestone Sep 21, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

cur_data_all() for rowwise tibble un-nests list variables

3 participants