Experimental Setup
Datasets
To cover a wide range of use cases, we evaluate SVS on standard datasets of diverse dimensionalities (\(d=25\) to \(d=768\)), number of elements (\(n=10^6\) to \(n=10^9\)), data types and metrics as described in the table below.
Dataset |
d |
n |
Encoding |
Similarity |
n queries |
Space (GiB) |
960 |
1M |
float32 |
L2 |
1000 |
3.6 |
|
128 |
1M |
float32 |
L2 |
10000 |
0.5 |
|
96 |
10M |
float32 |
cosine similarity |
10000 |
3.6 |
|
50 |
1.2M |
float32 |
cosine similarity |
10000 |
0.2 |
|
25 |
1.2M |
float32 |
cosine similarity |
10000 |
0.1 |
|
200 |
100M |
float32 |
inner product |
10000 |
74.5 |
|
96 |
100M |
float32 |
cosine similarity |
10000 |
35.8 |
|
96 |
1B |
float32 |
cosine similarity |
10000 |
257.6 |
|
128 |
1B |
uint8 |
L2 |
10000 |
119.2 |
Evaluation Metrics
In all benchmarks and experimental results, search accuracy is measured by k-recall at k, defined by \(| S \cap G_t | / k\), where \(S\) are the ids of the \(k\) retrieved neighbors and \(G_t\) is the ground-truth. We use \(k=10\) in all experiments. Search performance is measured by queries per second (QPS).
Query Batch Size
The size of the query batch, which will depend on the use case, can have a big impact on performance. Therefore, we evaluate batch sizes: 1 (one query at a time or single query), 128 (typical batch size for deep learning training and inference) and full batch (determined by the number of queries in the dataset, see Datasets).