Skip to content

Binary Indexes

alibeklfc edited this page Jul 18, 2025 · 21 revisions

Overview

Faiss supports indexing binary vectors (with Hamming distance), with the IndexBinaryFlat, IndexBinaryIVF and IndexBinaryHNSW and IndexBinaryHash/IndexBinaryMultiHash indexes (all inheriting from IndexBinary).

Those indexes store the vectors as arrays of bytes, so that a vector of size d takes only d / 8 bytes in memory. Note that at the moment, only vectors with sizes multiple of 8 are supported. Of course, you can round up the vector length if needed.

The input to the add() and search() methods are also byte arrays ("uint8_t" in C++, "uint8" in numpy). The returned indices are 64-bit ints, distances are 32-bit ints.

From Faiss 1.6.3, most index types also support range_search.

IndexBinaryFlat

The "flat" binary index performs an exhaustive search.

The exhaustive search is carefully optimized especially for 256-bit vectors that are quite common. The Hamming distance computations are optimized using popcount CPU instructions.

Batching is applied on the query and the database side to avoid cache misses.

The values of hamming_batch_size and faiss::IndexBinaryFlat#query_batch_size can be customized to adjust the batch sizes but the default values were found to be close to optimal for a large range of settings.

The examples show how to pass in binary data and how to query the index.

In C++
#include <faiss/IndexBinaryFlat.h>

// Dimension of the vectors, assumed to be a multiple of 8.
int d = 256;

// Vectors to be indexed, each represented by d / 8 bytes, layed out sequentially,
// i.e. the i-th vector starts at db[i * (d / 8)].
int nb = ...;
std::vector<uint8_t> db(nb * (d / 8));
// initialize db

// Vectors to be queried from the index. 
int nq = ...;
std::vector<uint8_t> queries(nq * (d / 8));

// Initializing index.
faiss::IndexBinaryFlat index(d);

// Adding the database vectors.
index.add(nb, db.data());

// Number of nearest neighbors to retrieve per query vector.
int k = ...;

// Output variables for the queries.
std::vector<int32_t> distances(nq * k);
std::vector<faiss::Index::idx_t> labels(nq * k);

// Querying the index
index.search(nq, queries.data(), k, distances.data(), labels.data());

// distances[i * k + j] contains the distance from the i-th query vector to its j-th nearest neighbor.
// labels[i * k + j] contains the id of the j-th nearest neighbor of the i-th query vector.
In Python
import faiss

# Dimension of the vectors.
d = 256

# Vectors to be indexed, each represented by d / 8 bytes in a nb
# i.e. the i-th vector is db[i].
nb = ...
db = np.empty((nb, d // 8), dtype='uint8')
...initialize db...

# Vectors to be queried from the index.
nq = ...
queries = np.empty((nq, d // 8), dtype='uint8')
...initialize queries...

# Initializing index.
index = faiss.IndexBinaryFlat(d)

# Adding the database vectors.
index.add(db)

# Number of nearest neighbors to retrieve per query vector.
k = ...;

# Querying the index
D, I = index.search(queries, k)

# D[i, j] contains the distance from the i-th query vector to its j-th nearest neighbor.
# I[i, j] contains the id of the j-th nearest neighbor of the i-th query vector.

IndexBinaryIVF

The "IVF" (Inverse Vector File) flavor speeds up the search by clustering the vectors. This clustering is done using a second (binary) index for quantization (usually a flat index). This is equivalent to the IndexIVFFlat of the floating-point indexes.

Examples:

In C++
#include <faiss/IndexBinaryIVF.h>

// Dimension of the vectors.
int d = 256;

// Vectors to be indexed, each represented by d / 8 bytes, layed out sequentially,
// i.e. the i-th vector starts at db[i * (d / 8)].
int nb = ...;
std::vector<uint8_t> db(nb * (d / 8);

// Vectors to train the quantizer.
int nt = ...;
std::vector<uint8_t> training(nt * (d / 8));

// Vectors to be queried from the index.
int nq = ...;
std::vector<uint8_t> queries(nq * (d / 8));

// Initializing the quantizer.
faiss::IndexBinaryFlat quantizer(d);

// Number of clusters.
int nlist = ...;

// Initializing index.
faiss::IndexBinaryIVF index(&quantizer, d, nlist);
index.nprobe = 4; // Number of nearest clusters to be searched per query. 

// Training the quantizer.
index.train(nt, training.data());

// Adding the database vectors.
index.add(nb, db.data());

// Number of nearest neighbors to retrieve per query vector.
int k = ...;

// Output variables for the queries.
std::vector<int32_t> distances(nq * k);
std::vector<faiss::idx_t> labels(nq * k);

// Querying the index
index.search(nq, queries.data(), k, distances.data(), labels.data());

// distances[i * k + j] contains the distance from the i-th query vector to its j-th nearest neighbor.
// labels[i * k + j] contains the id of the j-th nearest neighbor of the i-th query vector.
In Python
import faiss

# Dimension of the vectors.
d = 256

# Vectors to be indexed, each represented by d / 8 bytes.
# the i-th vector is db[i].
db = ...

# Vectors to train the quantizer.
training = ...

# Vectors to be queried from the index.
queries = ...

# Initializing the quantizer.
quantizer = faiss.IndexBinaryFlat(d)

# Number of clusters.
nlist = ...

# Initializing index.
index = faiss.IndexBinaryIVF(quantizer, d, nlist)
index.nprobe = 4 # Number of nearest clusters to be searched per query. 

# Training the quantizer.
index.train(training)

# Adding the database vectors.
index.add(db)

# Number of nearest neighbors to retrieve per query vector.
k = ...

# Querying the index.
D, I = index.search(queries, k)

# D[i, j] contains the distance from the i-th query vector to its j-th nearest neighbor.
# I[i, j] contains the id of the j-th nearest neighbor of the i-th query vector.

The IndexBinaryHNSW

This is the same method as for the floating point vectors.

Example usage here: TestHNSW

The IndexBinaryHash and IndexBinaryMultiHash

(Faiss 1.6.3 and above)

IndexBinaryHash: A classical method is to extract a hash from the binary vectors and to use that to split the dataset in buckets. At search time, all hashtable entries within nflip Hamming radius of the query vector's hash are visited.

The hash value is the first b bits of the binary vector. At search time, the number of visited buckets is 1 + b + b * (b - 1) / 2 + .... + comb(b, nflip).

IndexBinaryMultiHash: An extension of this method, studied in “Fast Search in Hamming Space with Multi-Index Hashing”, Norouzi et al, CVPR’12, consists in using nhash hash tables built from bits 0:b, b:2*b, … , (nhash-1) * b: nnhash * b`. This is sometimes referred to as LSH because of the multiple hash tables.

A comparison between IndexBinaryHash and IndexBinaryIV is here: Binary hashing index benchmark

Shorter versions using index factory

The faiss::index_binary_factory() allows for shorter declarations of binary indexes. It is especially useful for IndexBinaryIVF, for which a quantizer needs to be initialized.

How to use index_binary_factory:

In C++

Instead of the above initialization code:

// Initializing the quantizer.
faiss::IndexBinaryFlat quantizer(d);

// Number of clusters.
int nlist = 32;

// Initializing index.
faiss::IndexBinaryIVF index(&quantizer, d, nlist);
index.nprobe = 4; // Number of nearest clusters to be searched per query. 

one could write:

#include <faiss/AutoTune.h>
  
// Initializing the quantizer.
faiss::IndexBinaryIVF *index = dynamic_cast<faiss::IndexBinaryIVF *>(faiss::index_binary_factory(d, "BIVF32"));
index->nprobe = 4; // Number of nearest clusters to be searched per query.
In Python

Instead of the above initialization code:

# Initializing the quantizer.
quantizer = faiss.IndexBinaryFlat(d)

# Number of clusters.
nlist = 32

# Initializing index.
index = faiss.IndexBinaryIVF(quantizer, d, nlist)
index.nprobe = 4 # Number of nearest clusters to be searched per query. 

one could write:

# Initializing the quantizer.
index = faiss.index_binary_factory(d, "BIVF32")
index.nprobe = 4 # Number of nearest clusters to be searched per query.

Table of available index_binary_factory strings:

String Class name bytes per vector Comments
BFlat IndexBinaryFlat d/8 simple flat index
BIVF1024 IndexBinaryIVF d/8 IVF with 1024 centroids and a flat quantizer
BHNSW16 IndexBinaryHNSW d/8 + 16 * 2 * 4 HNSW with branching factor M=16.
BIVF1024_HNSW16 IndexBinaryIVF d/8 IVF with 1024 centroids and HNSW M=16 used as a quantizer.
BHash32 IndexBinaryHash d/8 Binary hash index with 32 bit prefix.
BHash4x32 IndexBinaryMultiHash 4*64/8 Multiple binary hash index with 32 bits each.

The bytes per vector are approximate, there is additional memory overhead due to geometric std::vector allocation, the coarse quantizer in the IVF and the HSNW tree.

Asymmetric search with binary codes

Asymmetric search (ie. database vectors are compressed but the queries are not) is not supported directly with binary indexes. Here is a script that does it manually: demo_asymmetric_binary.ipynb.

RaBitQuantizer

This class implements the key methods used by the RaBitQ algorithm, as described in the paper RaBitQ: Quantizing High-Dimensional Vectors with a Theoretical Error Bound for Approximate Nearest Neighbor Search by Gao and Long.

This helper class is imported by the main Index class of the RaBitQ algorithm. It includes important methods such as creating a codebook, decoding a vector, and an efficient method to approximate distances.

Deeper explanations of some functions:

  • train: this method is intentionally left empty as it is not required. The RabitQuantizer class inherits from AdditiveQuantizer, which in turn inherits from Quantizer. The train method is declared as virtual in the Quantizer class, which is why this method exists in the RabitQuantizer class, even though it performs no operations.
  • compute_codes: takes input vectors and uses them to compute the codebook as described in Section 3 of the paper. It generates binary codes for each input vector by recording whether each dimension is positive or negative after subtracting a centroid, and calculates scaling factors that help accurately estimate distances later during search.
  • decode: reconstructs approximate vectors from their binary codes by converting each bit back into a signed value, scaling it using stored factors, and adding back the centroid if used.
  • get_distance_computer: returns a distance computation tool based on whether the query vectors are quantized or not. It creates one tool for unquantized queries and another optimized for quantized queries. Both compute approximate distances using the RaBitQ codes.
  • qb (query bits): controls the level of quantization applied to query vectors, with qb=0 meaning no quantization and qb>0 specifying how many bits to use for scalar quantization of each query dimension.

IndexIVFRaBitQ

This class contains the implementation of the Index that uses coarse quantization with the RaBitQ’s binary quantization. This index utilizes the methods described in the RaBitQuantizer class.

IndexIVFRaBitQ inherits from IndexIVF and consists of two main phases, Index and Query.

Index Phase:

  • Similar to regular IVF: Vectors are assigned to their nearest centroids
  • For each vector, the residual (vector - centroid) is computed
  • The residual is encoded using RaBitQuantizer, representing each dimension with a single bit

Query Phase:

  • Similar to regular IVF: the query vector is compared to all centroids, and the nprobe closest centroids are selected
  • For each selected cluster, distances between the query and encoded vectors are computed directly on binary codes without full decoding
  • Similar to any index: Results are collected in a heap to return k nearest neighbors
Clone this wiki locally