Arrays with acceleration indices.
AcceleratedArrays provides (secondary) acceleration indexes for Julia AbstractArrays. Such
acceleration indexes can be used to speed up certain operations, particularly those
involving searching through the values - for example, an AcceleratedArray may have more
efficient implementations of functions such as findall, filter, and unique.
As a general rule, this package has been implemented for the purposes of accelerating analytics workloads and is designed to support functional, non-mutating workflows. It is currently not supported to add an index to data you expect to mutate afterwards.
To download this package, from Julia v1.0 press ] to enter package mode and type:
pkg> dev https://github.com/andyferris/AcceleratedArrays.jlAn AcceleratedArray is generally created by using the accelerate and accelerate!
functions.
# Construct a hash mapping to unique names
a = accelerate(["Alice", "Bob", "Charlie"], UniqueHashIndex)
# Rearrange an array of random numbers into ascending order
b = accelerate!(rand(1:100, 100), SortIndex)The resulting arrays can be used just like regular Julia arrays, except some operations become faster. For example, the hash map will let us find a certain element without exhaustively searching the array, or we can easily find all the elements within a given interval with a sorted array.
# Find the index of "Bob" in `a`
findall(isequal("Bob"), a)
# Return all the numbers in `b` between 40 and 60
filter(in(40..60), b)Accelerations are fully implemented for the following functions, where a is an
AcceleratedArray:
x ∈ acount(pred, a)findall(pred, a)filter(pred, a)
There is some work-in-progress on a variety of other functions, including some from SplitApplyCombine:
findfirst(pred, a)andfindlast(pred, a)unique(a)group,groupinds,groupviewandgroupreduceinnerjoin
Accelerations are only available for some predicates pred, which naturally depend on the
acceleration index used (see below for a full set).
The package intruduces the AbstractIndex supertype and the following concrete implemetations.
Generally, an index is created when the user calls accelerate or accelerate!.
This index constructs a hashmap between values in the array, and the corresponding array
indices. For example, invoking findall to search for the locations of certain values
will be reduced to a simple dictionary lookup. Primarily accelerates commands using the
isequal predicate.
Like HashIndex, except each value in the array can only appear once. Apart from
guaranteeing uniqueness, certain operations may be faster with a UniqueHashIndex than
with a HashIndex.
This index determines the order of the elements (with respect to isless). This index
can accelerate not only the isequal predicate, but a variety of other order-based
predicates as well (see below).
The accelerate! function will rearrange the input array, like sort!. This can speed
up operations due to simplified algorithms and cache locality.
Like SortIndex, except each value in the array can only appear once. Apart from
guaranteeing uniqueness, certain operations may be faster with a UniqueSortIndex than
with a SortIndex.
It is simple for a user or another package to implement an AbstractIndex - for instance
a third-party package may provide a spatial acceleration index, or an index for fast
textual search. Simply overload accelerate (and optionally accelerate!) as well as the
operations you would like to accelerate, such as filter, findall, etc. Indices for
unique sets of values may inherit from AbstractUniqueIndex <: AbstractIndex.
In Julia, sorting is (typically) achieved using the isless and isequal predicates,
which are designed to provide a canonical total order for values. Currently, the
acceleration indices rely on these rather than the comparison operators ==, <, <=,
>, >= and !=.
To make life easier, this package introduces a number of new convenience functions:
islessequal(a, b) = isless(a, b) || isequal(a, b)isgreater(a, b) = isless(b, a)isgreaterequal(a, b) = isless(b, a) || isequal(a, b)
Any of these support "currying", which is a simple syntax for creating a closure such as
isequal(a) = (b -> isequal(a, b)). Such curried predicates are picked up by multiple
dispatch to accelerate operations like findall(isequal(3.0), accelerated_array).
It is common to want to search for all values in a range. This package introduces an
Interval type to represent the set of of values between two endpoints (with respect to
isless and isequal).
An interval is easily created with the .. operator via the syntax a .. b. To find if
a value is in this range, use the in function/operator (alternatively spelled ∈, which
can be inserted at the REPL via \in <TAB>). For example, 3 ∈ 0 .. 10 is true but
13 ∈ 0 .. 10 is false.
By default, an interval is inclusive of its endpoints, such that 10 ∈ 0 .. 10. An endpoint
can be excluded via the lessthan or greaterthan function, which returns a value almost equal
to but slightly less/greater than its input. An interval exclusive of both its endpoints can be
expressed as greaterthan(a) .. lessthan(b). For example 10 ∉ 0 .. lessthan(10).
This package is still young, and could support some more features, such as:
- Accelerate more functions, including those in
SplitApplyCombine. - Figure out how to support
missing,==,<with either a hash- or sort-based index. - Move
Intervals into their own package, potentially reconcile with IntervalSets.jl (which currently uses<=and>=for comparisons).