Experiments with LUT'ed binary search in C++

Instead of starting binary search with the range [0, N), a small lookup-table is used to provide a smaller, more precise initial range.

See the first reference for a full explanation.

Obviously this approach only makes sense for repeated searches over the same data.

The purpose of this repo is to explore issues around building and using the LUT on various types of input, it is not to build a production-ready binary-search per se.

Notes

Unsigned keys work very well, especially if they're uniformly distributed.

The extra-shift approach, which AFAIK is novel by me, but was probably explored by Tarjan et.al in the 1980s, neatly handles the problem of small-value keys in a larger key-type, which would otherwise distribute poorly over the LUT buckets because all the top-bits would be zeros.

Even better distribution can be had by applying key-compaction, in which we use parallel bit-extraction -- e.g PEXT from the BMI2 instruction set -- to calculate a better bucket distribution by removing irrelevant bits.

Issues

Signed keys work, but distributes poorly for (small) standard distributions because extra-shift can't handle them. The idea for improving this is to remove/ignore all redundant sign bits. i.e, if the input is int32 keys, but all values fit in 18 bits, then treat the keys like they're int18.

Bucket distribution issues relating to floating-point keys are not well understood at this time. but key-compaction seems promising.

Related work

This connects to my work on radix sorting in that sorting is a requirement, of course, and it also shares the concept of key-derivation to impose a total order of the input keys in 'unsigned space', which is what enables the bit-tricks used to parition the keys.

In addition, in the radix sorting problem we care about minimizing the number of bits of key that we need to sort on (i.e find the relevant bits), in order to reduce the number of sorting passes required. By using something like the 'extra-shift' approach, or full key-compaction, together with column-skipping, improved performance can be had.

References

David Geier "Optimizing binary search", 2013.
Relates to "Interpolation Search".

Name		Name	Last commit message	Last commit date
Latest commit History 19 Commits
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md
lutbs-test.cpp		lutbs-test.cpp
lutbs.hpp		lutbs.hpp
relbits.cpp		relbits.cpp

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Experiments with LUT'ed binary search in C++

Notes

Issues

Related work

References

About

Uh oh!

Languages

License

eloj/binary-search-kdf

Folders and files

Latest commit

History

Repository files navigation

Experiments with LUT'ed binary search in C++

Notes

Issues

Related work

References

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Languages