Skip to content
Merged
Show file tree
Hide file tree
Changes from 2 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions NEWS.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,7 @@
# dplyr (development version)

* `cur_data()` and `cur_data_all()` not simplifying list columns rowwise data frames (#5901).

* `coalesce()` accepts 1-D arrays (#5557).

* `filter()` forbids matrix results (#5973) and warns about data frame
Expand Down
8 changes: 8 additions & 0 deletions R/data-mask.R
Original file line number Diff line number Diff line change
Expand Up @@ -67,6 +67,14 @@ DataMask <- R6Class("DataMask",

pick = function(vars) {
cols <- self$current_cols(vars)
if (inherits(private$data, "rowwise_df")) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you explain why we have to re-wrap in a list? (As opposed to not unwrapping somewhere?)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It happens in this line:

if (rowwise && vctrs::vec_is_list(column) && Rf_length(column) > 0) {

void dplyr_lazy_vec_chop_grouped(SEXP chops_env, SEXP rows, SEXP data, bool rowwise) {
  SEXP names = PROTECT(Rf_getAttrib(data, R_NamesSymbol));
  R_xlen_t n = XLENGTH(data);

  const SEXP* p_data = VECTOR_PTR_RO(data);
  const SEXP* p_names = STRING_PTR_RO(names);
  for (R_xlen_t i = 0; i < n; i++) {
    SEXP prom = PROTECT(Rf_allocSExp(PROMSXP));
    SET_PRENV(prom, R_EmptyEnv);
    SEXP column = p_data[i];
    if (rowwise && vctrs::vec_is_list(column) && Rf_length(column) > 0) {
      SET_PRCODE(prom, column);
    } else {
      SET_PRCODE(prom, Rf_lang3(dplyr::functions::vec_chop, column, rows));
    }
    SET_PRVALUE(prom, R_UnboundValue);

    Rf_defineVar(rlang::str_as_symbol(p_names[i]), prom, chops_env);
    UNPROTECT(1);
  }

  UNPROTECT(1);
}

When we setup the promise that makes the chops, and this is a rowwise data, we can skip using vec_chop() and simply use the list columns as the chops.

This gives us the rowwise simplification for when we refer to the list column in an expression (not having to [[1]]).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

... and for non-original columns, i.e. those that are added, they go through the <DataMask>$add_one() method:

add_one = function(name, chunks, result) {
      if (inherits(private$data, "rowwise_df")){
        is_scalar_list <- function(.x) {
          vec_is_list(.x) && length(.x) == 1L
        }
        if (all(map_lgl(chunks, is_scalar_list))) {
          chunks <- map(chunks, `[[`, 1L)
        }
      }

      .Call(`dplyr_mask_add`, private, name, result, chunks)
    },

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah makes sense. Thanks!

cols <- map2(cols, names(cols), function(col, name) {
if (vec_is_list(private$current_data[[name]])) {
col <- list(col)
}
col
})
}
nrow <- length(self$current_rows())
new_tibble(cols, nrow = nrow)
},
Expand Down
12 changes: 12 additions & 0 deletions tests/testthat/test-context.R
Original file line number Diff line number Diff line change
Expand Up @@ -47,6 +47,18 @@ test_that("cur_data() gives current data without groups, cur_data_all() includes
)
})

test_that("cur_data()/cur_data_all() keeps list columns as lists in rowwise_df (#5901)", {
df <- tibble(x = list(tibble(a = 1), tibble(a = 2))) %>%
rowwise()

expect_true(
all(summarise(df, test = vec_is_list(cur_data()$x))$test)
)
expect_true(
all(summarise(df, test = vec_is_list(cur_data_all()$x))$test)
)
})

test_that("cur_group_rows() retrieves row position in original data", {
df <- tibble(x = c("b", "a", "b"), y = 1:3)
gf <- group_by(df, x)
Expand Down