-
-
Notifications
You must be signed in to change notification settings - Fork 5.7k
Description
I've always loved how in Julia (and MATLAB) one can create a new array from an old one, using what is now the APL indexing rules. Basically if you index a collection of values with a collection of indices, you get a new collection of the indexed values. Beautiful, simple. Indexing has also been extended by allowing arrays that don't use 1-based indexing by e.g. the OffsetArrays.jl package.
I'm not sure if this issue exists elsewhere as its own entity (cleaning up distinctions between arrays and associatives was surely mentioned in #20402 and this Julep seems to be a logical extension of #22907), but here I propose specifically that we extend indexing of and by Associative and make related changes so that the semantics are consistent across these two types of container. I prototyped ideas at https://github.com/andyferris/AssociativeArray.jl and basically came up with the ability to (with simple code):
- Index an
Associative{K,V}with anAssociative{I,K}to get anAssociative{I,V}. E.g.Dict(:a=>1, :b=>2, c:=>3)[Dict("a"=>:a, "c"=>:c)] == Dict("a"=>1, "c"=>3). - Index an
Associative{K,V}with anAbstractArray{K,N}to get anAbstractArray{V,N}. E.g.Dict(:a=>1, :b=>2, c:=>3)[[:c, :a]] == [3,1]. - Index an
AbstractArray{T,N}with anAssociative{K,I}to get anAssociative{K,T}(whereImight beIntfor linear indexing, or aCartesianIndex{N}for Cartesian indexing). E.g.[11,12,13][Dict(:a=>1, :c=>3)] == Dict(:a=>11, :c=>13).
The semantics are consistent across arrays and dictionaries, and provide that for out = a[b]:
- The output container
outshares the indices ofb(note: these areCartesianRangefor arrays) - The values
out[i]correspond toa[b[i]].
This is fully consistent with both the Base arrays and the OffsetArrays.jl package (We can do something similar for setindex!).
To make everything consistent, it helps to make the following associated changes:
- Make
Associatives be containers of values, not ofindex=>valuepairs, so that arrays and dictionaries are consistent on this fundamental point. Use the existingpairsfunction when necessary (and ideally make it preserve indexability). - Make
similaralways return a container with the same indices, even for dictionaries. Ideally, unifysimilaracrossAssociatives andArrays (for example a dictionary which issimilarto a distributed array might also be distributed) via use of the indices. - Have an new
emptyfunction that makes emptyDicts andVectors to which elements should be added. (Done, Addemptyand changesimilar(::Associative)#24390). - Consider whether we want to have collection of things you call
getindexandsetindex!with be calledindices, rather thankeys(and rename the currentindices(::AbstractArray)to something else) - Have
viewwork for the various combinations wheregetindexworks.
The demonstration package also prototypes making AbstractArray{T, N} <: Associative{CartesianIndex{N}, T} - I don't think this is strictly necessary but it helped (me) to highlight which parts of the existing interface were inconsistent. The package does demonstrate that we can put something simple together without excessive amounts of code (some performance tuning is surely required).
Finally, a word on what motivates this: lately I've been playing with what fundamental data operations (such as mapping, grouping, joining or filtering) would be useful for both generic data structures and tables/dataframes (that iterate rows), and I found whenever I created say a grouping (using a dictionary of groups), I immediately felt the loss of ability to do complex indexing and other operations with the result (as well have to worry whether the output iterates values or key-value pairs, etc).