Latest Results
We compare SVS to the widely adopted implementations HNSWlib [MaYa18] and FAISS’s IVFPQfs [DGDJ24] on various datasets covering a wide variety of practical scenarios, including different scales (\(n=10^6\) to \(n=10^9\)), dimensionalities (\(d=96, 200, 512, 768\)), and deep learning modalities (texts, images, and multimodal). A summary of the experimental setup is presented below.
Results summary:
At very large scale (over 10M vectors, up to 768-dimensional), SVS outperforms its competitors in terms of QPS by up to
13.5x for large query batch size (10k),
8.1x for small query batch size (10),
4.4x for single query.
At million scale, SVS outperforms its competitors in terms of QPS by up to
6.5x for large query batch size (10k),
9.6x for small query batch size (10),
4.3x for single query.
Values reported for a search accuracy of 0.9 10 recall at 10.
The table below summarizes the QPS boost with respect to SVS closest competitor for different query batch sizes for a search accuracy of 0.9 10 recall at 10.
Query batch size = 10 |
Query batch size = 10k |
||||
SVS QPS |
QPS w.r.t. 2nd |
SVS QPS |
QPS w.r.t. 2nd |
||
Over million scale |
deep-96-1B |
95931 |
7.0x |
473771 |
5.6x |
deep-96-100M |
140505 |
4.5x |
728430 |
4.7x |
|
t2i-200-10M |
47148 |
3.2x |
276725 |
3.8x |
|
open-images-512-13M |
79507 |
3.3x |
406301 |
4.9x |
|
rqa-768-10M-OOD |
23296 |
8.1x |
111263 |
13.5x |
|
Million scale |
deep-96-1M |
385542 |
3.4x |
2504286 |
3.2x |
t2i-200-1M |
117024 |
3.1x |
712564 |
3.7x |
|
open-images-512-1M |
121425 |
2.7x |
675808 |
2.8x |
|
rqa-768-1M-OOD |
50138 |
9.6x |
240732 |
6.5x |
The figure below shows the QPS vs recall curves for the rqa-768-10M-OOD dataset (10M 768-dimensional embeddings generated with the dense passage retriever model RocketQA [QDLL21] with out-of-distribution queries).
Click on the triangles to see the QPS vs recall curves for the other datasets.
Single Query
We ran the benchmarking for the single query case (query batch size = 1) for a representative subset of the datasets. The table below summarizes the QPS boost with respect to SVS closest competitor for a search accuracy of 0.9 10 recall at 10.
Query batch size = 1 |
||
SVS QPS |
QPS w.r.t. 2nd |
|
deep-96-100M |
17561 |
4.3x |
open-images-512-13M |
9757 |
3.2x |
rqa-768-10M-OOD |
1620 |
4.4x |
open-images-512-1M |
15370 |
2.6x |
rqa-768-1M-OOD |
2928 |
4.3x |
The figure below shows the QPS vs recall curves for the rqa-768-10M-OOD dataset (10M 768-dimensional embeddings generated with the dense passage retriever model RocketQA [QDLL21] with out-of-distribution queries).
Click on the triangles to see the QPS vs recall curves for the other datasets.
Datasets
To cover a wide range of use cases, we evaluate SVS on standard datasets of diverse dimensionalities (\(d=96\) to \(d=768\)), number of elements (\(n=10^6\) to \(n=10^9\)), and metrics as described in the table below.
Dataset |
d |
n |
Encoding |
Similarity |
n queries |
Space (GiB) |
96 |
1B |
float32 |
cosine similarity |
10000 |
257.6 |
|
96 |
100M |
float32 |
cosine similarity |
10000 |
35.8 |
|
96 |
1M |
float32 |
cosine similarity |
10000 |
0.36 |
|
200 |
10M |
float32 |
inner product |
10000 |
7.5 |
|
200 |
1M |
float32 |
inner product |
10000 |
0.75 |
|
512 |
13M |
float32 |
cosine similarity |
10000 |
24.8 |
|
512 |
1M |
float32 |
cosine similarity |
10000 |
2.5 |
|
768 |
10M |
float32 |
inner product |
10000 |
28.6 |
|
768 |
1M |
float32 |
inner product |
10000 |
2.9 |
System Setup
We run our experiments (except dataset deep-96-1B) on a 2-socket 4th generation Intel® Xeon® Platinum 8480L CPU with 56 cores per socket, equipped with 512GB DDR4 memory per socket @4800MT/s speed, running Ubuntu 22.04. [1] [2] For the deep-96-1B, dataset we use a server with the same characteristics except that it is equipped with 1TB DDR4 memory per socket @4400MT/s speed.
We use numactl
to ran all experiments in a single socket (see NUMA Systems for details).
We use the hugeadm
Linux utility to preallocate a sufficient number of 1GB huge pages for each algorithm.
SVS explicitly uses huge pages to reduce the virtual memory overheads.
For a fair comparison, we run other methods with system flags enabled to automatically use huge pages for large allocations.
For the deep-96-1B dataset, we were not able to use huge pages for the other methods as this causes an out-of-memory issue.
The large query batch size experiments (batch size = 10k) are run using 112 threads (hyper threading enabled in a single socket), whereas the small query batch size (batch size = 10) is run using 10 threads. Using fewer threads benefits all methods for the small query batch size case.
Parameters Setting
We run different parameter settings for each method and generate Pareto curves with the best QPS for each recall value.
For SVS, we build graphs with: graph_max_degree
= 128 and alpha
= 1.2 or 0.95 if using Euclidean distance
or inner product, respectively.
For HSNWlib, we build graphs with graph_max_degree
= 32, 64, 128 (this corresponds
to M=16, 32, 64 in HSNW notation), except for the deep-96-1B dataset for which we use graph_max_degree
= 64 (M=32)
to be able to fit the working set size in 1TB memory.
For FAISS-IVFPQfs, we use nlist
= 4096, 8192, 16384 and nbins
\(=48\) (and nbins
\(=96\) for nlist
= 8192) for
all datasets except deep-96-1B for which we use nlist
= 32768 and nbins
\(=48\). Re-ranking is enabled, and
at runtime we sweep nprobe
\(=[1,5,10,50,100,200]\) and k for re-ranking
\(= [0,10,100,1000]\).
We used the following versions of each method: SVS commit a43c902, HNSWlib Python version hnswlib-0.8.0, and for FAISS-IVFPQfs faiss-cpu version 1.8.0.
Evaluation Metrics
In all benchmarks and experimental results, search accuracy is measured by k-recall at k, defined by \(| S \cap G_t | / k\), where \(S\) are the ids of the \(k\) retrieved neighbors and \(G_t\) is the ground-truth. We use \(k=10\) in all experiments. Search performance is measured by queries per second (QPS).
Footnotes