Skip to content

join in [.data.table could be consistent to SQL #1615

@jangorecki

Description

@jangorecki

Currently data.table joins are consistent with base R.
This is somehow awkward for some queries.

library(data.table)
x = data.table(a=1:3, w=letters[1:3])
y = data.table(b=3:5, z=6:4)
x[y, on=c(a="b")]
#   a  w z
#1: 3  c 6
#2: 4 NA 5
#3: 5 NA 4
x[y, .(a, b), on=c(a="b")]
#   a b
#1: 3 3
#2: 4 4
#3: 5 5

Join consistency to base R could be kept in merge.data.table method for base R merge generic, while the joins within [.data.table could be consistent to SQL - which does not impose limitation as base R. [.data.frame does not allow joins so it wouldn’t break consistency here.
Change would generally break the code which relies on invalid base R join behavior.

For reference SQL output from postgres:

#$`SELECT * FROM x RIGHT OUTER JOIN y ON x.a = y.b;`
#    a  w b z
#1:  3  c 3 6
#2: NA NA 4 5
#3: NA NA 5 4
#
#$`SELECT a, b FROM x RIGHT OUTER JOIN y ON x.a = y.b;`
#    a b
#1:  3 3
#2: NA 4
#3: NA 5

Just to link related issues: #1700, #1761, #1469

Metadata

Metadata

Assignees

No one assigned

    Labels

    breaking-changeissues whose solution would require breaking existing behaviorenhancementjoinsUse label:"non-equi joins" for rolling, overlapping, and non-equi joinsquestion

    Type

    No type

    Projects

    No projects

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions