Library Features

Here we present the main library features, including the supported index types, distance functions and data types.

Index Types 

SVS supports the following index types:

Graphs for static datasets
Graphs for streaming data
Flat index

Graphs for static datasets

The Vamana graph (in Python, in C++) enables fast in-memory graph-based similarity search with high accuracy for static databases, where the database is fixed and never updated.

Graphs for streaming data

The DynamicVamana graph (in Python, in C++) enables fast in-memory graph-based similarity search with high accuracy for streaming data, where the database is built dynamically by adding and removing vectors.

Flat Index

The flat index (in Python, in C++) can be used to run exhaustive search, e.g., useful to compute the ground-truth nearest neighbors for a dataset.

Distance functions 

SVS supports the distance functions listed in Built-In Distance Functors (see svs.DistanceType for the corresponding Python classes). The distance function is specified when the index is created by the corresponding index constructors. In the case of the Vamana index, it must also be specified when the graph is built (see svs.Vamana.build and svs::Vamana::build() for details).

Data types 

The supported data types are: float32, float16, int8 and uint8. Other data types might work but performance has not been tested.

The data type can be set independently for the database vectors and the query vector. For example, one could compress the database vectors to float16, which allows for a 2x storage reduction often with negligible accuracy loss, and keep the query in float32.

In Python

The data type for the database vectors is specified by the data_type argument when the vectors are loaded with svs.VectorDataLoader. The data type for the query vectors is specified in the query_type argument for the corresponding index constructors (svs.Vamana, svs.Flat).

In C++

Click to display

The database vectors data type is specified in the template argument of the svs::VectorDataLoader.

svs::VectorDataLoader<float>("data_f32.svs")

For details on setting the query vectors data type see Indexes.

Warning

This will not perform any dataset conversion. If a dataset was saved to disk as float16 data, for example, then it must be loaded with data_type = svs.DataType.float16 in Python or svs::Float16 in C++.

The supported data type combinations for (queries, database vectors) are: (float32, float32), (float32, float16), (uint8, uint8), (int8, int8), among others.

Vector compression 

Note

The open-source SVS library supports all documented features except our proprietary vector compression methods –LVQ and Leanvec– which are not open-source and run only on Intel CPUs. These are available via a shared library (C++) and PyPI package (Python). See examples for Python and C++ usage.

LeanVec and LVQ Compression Techniques

SVS incorporates two novel compression strategies, LVQ [ABHT23] and LeanVec [TBAH24], to enhance memory efficiency and accelerate similarity search operations. These techniques compress high-dimensional vectors while maintaining the spatial relationships necessary for accurate retrieval. See Choosing the Right Compression for details on selecting the best approach for your case.

LVQ: Locally-Adaptive Vector Quantization

LVQ employs a combination of per-vector normalization and scalar quantization to reduce memory footprint. It supports rapid distance calculations, especially when paired with SIMD-optimized layouts like Turbo LVQ. The compression parameters are learned from the input data, allowing for adaptive and efficient encoding.

See this example for details on how to use LVQ and the guidelines to choose the right compression strategy.

LeanVec Compression

LeanVec builds upon LVQ by integrating dimensionality reduction, making it particularly effective for very high-dimensional datasets. It delivers significant performance improvements while conserving memory. LeanVec is designed to handle both cases where queries follow the same distribution as the base vectors (in-distribution) and cases where they follow a different distribution (out-of-distribution), such as in cross-modal search tasks like text-to-image retrieval.

See these examples in Python and C++ for details on how to use LeanVec and the guidelines to choose the right compression strategy.

Note

Support for out-of-distribution queries is in experimental mode. You can try it using our Python benchmarking tool.

Two-Level Compression

Both LVQ and LeanVec support a dual-stage compression scheme. In LVQ, the first stage quantizes the vector to capture its core structure, while the second stage encodes the residual error for improved accuracy. The initial quantization enables fast candidate retrieval with B₁ bits per dimension, and the residuals encoded with B₂ bits per dimension are used for re-ranking. For example, LVQ4x8 uses 4 bits per dimension for fast candidate retrieval and 8 bits per dimension for re-ranking. LVQ can also work on a one level scheme using B₁ bits per dimension without re-ranking (e.g., LVQ8).

LeanVec follows a similar approach: the first level reduces dimensionality and applies LVQ for fast search, and the second level applies LVQ to the original vectors for precise re-ranking. Importantly, neither method relies on full-precision vectors – everything operates on compressed data.

Naming Convention

Compression configurations follow the format: LVQ<B₁>x<B₂>, where:

B₁: Bits per dimension for the first-level quantization (primary).
B₂: Bits per dimension for the second-level residual encoding (residual).

Examples:

LVQ4x8: 4 bits for initial quantization, 8 bits for residuals (total 12 bits per dimension).
LVQ8: Single-level compression using 8 bits per dimension.

LeanVec uses the same naming scheme. The primary_kind and secondary_kind arguments set the first and second level number of bits per dimension.

Training and Adaptability

The effectiveness of LVQ and LeanVec stems from their ability to learn compression parameters from the data itself. This requires a representative sample of vectors during index initialization. If the data distribution shifts significantly over time, compression quality may degrade – a common challenge for all data-dependent methods.

8-bit scalar quantization

The open-source SVS library supports 8-bit scalar quantization. It uses the global minimum and maximum values across all embeddings to scale them, then applies uniform quantization per dimension using 8 bits. This functionality is currently available only in the C++ implementation, with support for Python bindings coming soon.

See this example for details on how to use the 8-bit scalar quantization.

References

[SDSK19]

Subramanya, S.J.; Devvrit, F.; Simhadri, H.V.; Krishnawamy, R.; Kadekodi, R..: Diskann: Fast accurate billion-point nearest neighbor search on a single node. In: Advances in Neural Information Processing Systems 32 (2019).

[ABHT23]

Aguerrebere, C.; Bhati I.; Hildebrand M.; Tepper M.; Willke T..: Similarity search in the blink of an eye with compressed indices. In: Proceedings of the VLDB Endowment, 16, 11, 3433 - 3446. (2023)

[AHBW24]

Aguerrebere, C.; Hildebrand M.; Bhati I.; Willke T.; Tepper M..: Locally-adaptive Quantization for Streaming Vector Search. In: arxiv preprint arXiv:2402.02044 (2024)

[MaYa18]

Malkov, Y. A. and Yashunin, D. A..: Efficient and robust approximate nearest neighbor search using hierarchical navigable small world graphs. In: IEEE transactions on pattern analysis and machine intelligence 42, 4 (2018), 824–836.

[JoDJ19]

Johnson, J.; Douze, M.; Jégou, H..: Billion-scale similarity search with GPUs. In: IEEE Transactions on Big Data 7, 3 (2019), 535–547.

[GSLG20]

Guo, R.; Sun, P.; Lindgren, E.; Geng, Q.; Simcha, D.; Chern, F.; Kumar, S..: Accelerating large-scale inference with anisotropic vector quantization. In International Conference on Machine Learning. PMLR, 3887-3896 (2020)

[IwMi18]

Iwasaki, M. and Miyazaki, D..: Nearest Neighbor Search with Neighborhood Graph and Tree for High-dimensional Data. https://github.com/yahoojapan/NGT (2018)

[AuBF20]

Aumüller, M.; Bernhardsson, E.; Faithfull, A..: ANN-Benchmarks: A benchmarking tool for approximate nearest neighbor algorithms. In: Information Systems 87 (2020), 101374. https://doi.org/10.1016/j.is.2019.02.006

[QDLL21]

Qu, Y.; Ding, Y.; Liu, J.; Liu, K.; Ren, R.; Zhao, W. X.; Dong, D.; Wu, H. and Wang, H..: RocketQA: An optimized training approach to dense passage retrieval for open-domain question answering. In: Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 5835–5847. (2021)

[SSKS21]

Singh, A.; Subramanya, S.J.; Krishnaswamy, R.; Simhadri, H.V..: FreshDiskANN: A Fast and Accurate Graph-Based ANN Index for Streaming Similarity Search. In: arxiv preprint arXiv:2105.09613 (2021)

[DGDJ24]

Douze, M.; Guzhva, A.; Deng, C.; Johnson, J.; Szilvasy, G.; Mazaré, P.E.; Lomeli, M.; Hosseini, L.; Jégou, H.: The Faiss library. In: arxiv preprint arXiv:2401.08281 (2024)

[TBAH24]

Tepper M.; Bhati I.; Aguerrebere, C.; Hildebrand M.; Willke T.: LeanVec: Search your vectors faster by making them fit. In: Transactions on Machine Learning Research(TMLR), ISSN, 2835 - 8856. (2024)