Library Features
Here we present the main library features, including the supported index types, distance functions and data types.
Index Types
SVS supports the following index types:
Graphs for static datasets
The Vamana graph (in Python, in C++) enables fast in-memory graph-based similarity search with high accuracy for static databases, where the database is fixed and never updated.
Graphs for streaming data
The DynamicVamana graph (in Python, in C++) enables fast in-memory graph-based similarity search with high accuracy for streaming data, where the database is built dynamically by adding and removing vectors.
Flat Index
The flat index (in Python, in C++) can be used to run exhaustive search, e.g., useful to compute the ground-truth nearest neighbors for a dataset.
Distance functions
SVS supports the distance functions listed in Built-In Distance Functors (see svs.DistanceType
for the corresponding
Python classes). The distance function is specified when the index is created by the corresponding index constructors. In the
case of the Vamana index, it must also be specified when the graph is built (see svs.Vamana.build
and
svs::Vamana::build()
for details).
Data types
The supported data types are: float32, float16, int8 and uint8. Other data types might work but performance has not been tested.
The data type can be set independently for the database vectors and the query vector. For example, one could compress the database vectors to float16, which allows for a 2x storage reduction often with negligible accuracy loss, and keep the query in float32.
In Python
The data type for the database vectors is specified by the data_type
argument when the vectors are loaded with
svs.VectorDataLoader
. The data type for the
query vectors is specified in the query_type
argument for the corresponding index constructors
(svs.Vamana
, svs.Flat
).
In C++
The database vectors data type is specified in the template argument of the svs::VectorDataLoader
.
svs::VectorDataLoader<float>("data_f32.svs")
For details on setting the query vectors data type see Indexes.
Warning
This will not perform any dataset conversion. If a dataset was saved to disk as float16 data, for example,
then it must be loaded with data_type = svs.DataType.float16
in Python or
svs::Float16
in C++.
The supported data type combinations for (queries, database vectors) are: (float32, float32), (float32, float16), (uint8, uint8), (int8, int8), among others.
Vector compression
Note
The open-source SVS library supports all documented features except our proprietary vector compression methods –LVQ and Leanvec– which are not open-source and run only on Intel CPUs. These are available via a shared library (C++) and PyPI package (Python). See examples for Python and C++ usage.
LeanVec and LVQ Compression Techniques
SVS incorporates two novel compression strategies, LVQ [ABHT23] and LeanVec [TBAH24], to enhance memory efficiency and accelerate similarity search operations. These techniques compress high-dimensional vectors while maintaining the spatial relationships necessary for accurate retrieval. See Choosing the Right Compression for details on selecting the best approach for your case.
LVQ: Locally-Adaptive Vector Quantization
LVQ employs a combination of per-vector normalization and scalar quantization to reduce memory footprint. It supports rapid distance calculations, especially when paired with SIMD-optimized layouts like Turbo LVQ. The compression parameters are learned from the input data, allowing for adaptive and efficient encoding.
See this example for details on how to use LVQ and the guidelines to choose the right compression strategy.
LeanVec Compression
LeanVec builds upon LVQ by integrating dimensionality reduction, making it particularly effective for very high-dimensional datasets. It delivers significant performance improvements while conserving memory. LeanVec is designed to handle both cases where queries follow the same distribution as the base vectors (in-distribution) and cases where they follow a different distribution (out-of-distribution), such as in cross-modal search tasks like text-to-image retrieval.
See these examples in Python and C++ for details on how to use LeanVec and the guidelines to choose the right compression strategy.
Note
Support for out-of-distribution queries is in experimental mode. You can try it using our Python benchmarking tool.
Two-Level Compression
Both LVQ and LeanVec support a dual-stage compression scheme. In LVQ, the first stage quantizes the vector to capture its core structure, while the second stage encodes the residual error for improved accuracy. The initial quantization enables fast candidate retrieval with B₁ bits per dimension, and the residuals encoded with B₂ bits per dimension are used for re-ranking. For example, LVQ4x8 uses 4 bits per dimension for fast candidate retrieval and 8 bits per dimension for re-ranking. LVQ can also work on a one level scheme using B₁ bits per dimension without re-ranking (e.g., LVQ8).
LeanVec follows a similar approach: the first level reduces dimensionality and applies LVQ for fast search, and the second level applies LVQ to the original vectors for precise re-ranking. Importantly, neither method relies on full-precision vectors – everything operates on compressed data.
Naming Convention
Compression configurations follow the format: LVQ<B₁>x<B₂>, where:
B₁: Bits per dimension for the first-level quantization (
primary
).B₂: Bits per dimension for the second-level residual encoding (
residual
).
Examples:
LVQ4x8: 4 bits for initial quantization, 8 bits for residuals (total 12 bits per dimension).
LVQ8: Single-level compression using 8 bits per dimension.
LeanVec uses the same naming scheme. The primary_kind
and secondary_kind
arguments set the first and second level number of bits per dimension.
Training and Adaptability
The effectiveness of LVQ and LeanVec stems from their ability to learn compression parameters from the data itself. This requires a representative sample of vectors during index initialization. If the data distribution shifts significantly over time, compression quality may degrade – a common challenge for all data-dependent methods.
8-bit scalar quantization
The open-source SVS library supports 8-bit scalar quantization. It uses the global minimum and maximum values across all embeddings to scale them, then applies uniform quantization per dimension using 8 bits. This functionality is currently available only in the C++ implementation, with support for Python bindings coming soon.
See this example for details on how to use the 8-bit scalar quantization.
References
Subramanya, S.J.; Devvrit, F.; Simhadri, H.V.; Krishnawamy, R.; Kadekodi, R..: Diskann: Fast accurate billion-point nearest neighbor search on a single node. In: Advances in Neural Information Processing Systems 32 (2019).
Aguerrebere, C.; Bhati I.; Hildebrand M.; Tepper M.; Willke T..: Similarity search in the blink of an eye with compressed indices. In: Proceedings of the VLDB Endowment, 16, 11, 3433 - 3446. (2023)
Aguerrebere, C.; Hildebrand M.; Bhati I.; Willke T.; Tepper M..: Locally-adaptive Quantization for Streaming Vector Search. In: arxiv preprint arXiv:2402.02044 (2024)
Malkov, Y. A. and Yashunin, D. A..: Efficient and robust approximate nearest neighbor search using hierarchical navigable small world graphs. In: IEEE transactions on pattern analysis and machine intelligence 42, 4 (2018), 824–836.
Johnson, J.; Douze, M.; Jégou, H..: Billion-scale similarity search with GPUs. In: IEEE Transactions on Big Data 7, 3 (2019), 535–547.
Guo, R.; Sun, P.; Lindgren, E.; Geng, Q.; Simcha, D.; Chern, F.; Kumar, S..: Accelerating large-scale inference with anisotropic vector quantization. In International Conference on Machine Learning. PMLR, 3887-3896 (2020)
Iwasaki, M. and Miyazaki, D..: Nearest Neighbor Search with Neighborhood Graph and Tree for High-dimensional Data. https://github.com/yahoojapan/NGT (2018)
Aumüller, M.; Bernhardsson, E.; Faithfull, A..: ANN-Benchmarks: A benchmarking tool for approximate nearest neighbor algorithms. In: Information Systems 87 (2020), 101374. https://doi.org/10.1016/j.is.2019.02.006
Qu, Y.; Ding, Y.; Liu, J.; Liu, K.; Ren, R.; Zhao, W. X.; Dong, D.; Wu, H. and Wang, H..: RocketQA: An optimized training approach to dense passage retrieval for open-domain question answering. In: Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 5835–5847. (2021)
Singh, A.; Subramanya, S.J.; Krishnaswamy, R.; Simhadri, H.V..: FreshDiskANN: A Fast and Accurate Graph-Based ANN Index for Streaming Similarity Search. In: arxiv preprint arXiv:2105.09613 (2021)
Douze, M.; Guzhva, A.; Deng, C.; Johnson, J.; Szilvasy, G.; Mazaré, P.E.; Lomeli, M.; Hosseini, L.; Jégou, H.: The Faiss library. In: arxiv preprint arXiv:2401.08281 (2024)
Tepper M.; Bhati I.; Aguerrebere, C.; Hildebrand M.; Willke T.: LeanVec: Search your vectors faster by making them fit. In: Transactions on Machine Learning Research(TMLR), ISSN, 2835 - 8856. (2024)