-
Notifications
You must be signed in to change notification settings - Fork 1k
Description
The documentation (in ?data.table) still (erroneously) claims that i must be keyed when it is a data.table (missing the on exemption):
When
iis adata.table,xmust have a key.iis joined toxusingx's key and the rows inxthat match are returned. An equi-join is performed between each column inito each column inx's key; i.e., column 1 ofiis matched to the 1st column ofx's key, column 2 to the second, etc. The match is a binary search in compiled C in O(log n) time. If i has fewer columns thanx's key then not all ofx's key columns will be joined to (a common use case) and many rows ofxwill (ordinarily) match to each row ofi. If i has more columns thanx's key, the columns ofinot involved in the join are included in the result. Ifialso has a key, it isi's key columns that are used to match tox's key columns (column 1 ofi's key is joined to column 1 ofx's key, column 2 ofi's key to column 2 ofx's key, and so on for as long as the shorter key) and a binary merge of the two tables is carried out. In all joins the names of the columns are irrelevant; the columns ofx's key are joined to in order, either from column 1 onwards ofiwheniis unkeyed, or from column 1 onwards ofi's key. In code, the number of join columns is determined bymin(length(key(x)),if (haskey(i)) length(key(i)) else ncol(i)).
This should presumably be updated, perhaps something like:
If
iis adata.table, eitherxmust be keyed or the join columns must be specified inon(seeonbelow). In the case thatxis keyed andonis not used, [repeat original wording from "iis joined tox..."]
By the way, my instinct says that on overrides keyed joins (i.e., if we specify on, it doesn't matter what the keys of either table are), is that correct? If so perhaps that should be documented as well. Does doing so override the key of x? i?