Experimental Setup
Datasets
To cover a wide range of use cases, we evaluate SVS on standard datasets of diverse dimensionalities (
Dataset |
d |
n |
Encoding |
Similarity |
n queries |
Space (GiB) |
960 |
1M |
float32 |
L2 |
1000 |
3.6 |
|
128 |
1M |
float32 |
L2 |
10000 |
0.5 |
|
96 |
10M |
float32 |
cosine similarity |
10000 |
3.6 |
|
50 |
1.2M |
float32 |
cosine similarity |
10000 |
0.2 |
|
25 |
1.2M |
float32 |
cosine similarity |
10000 |
0.1 |
|
200 |
100M |
float32 |
inner product |
10000 |
74.5 |
|
96 |
100M |
float32 |
cosine similarity |
10000 |
35.8 |
|
96 |
1B |
float32 |
cosine similarity |
10000 |
257.6 |
|
128 |
1B |
uint8 |
L2 |
10000 |
119.2 |
Evaluation Metrics
In all benchmarks and experimental results, search accuracy is measured by k-recall at k, defined by
Query Batch Size
The size of the query batch, which will depend on the use case, can have a big impact on performance. Therefore, we evaluate batch sizes: 1 (one query at a time or single query), 128 (typical batch size for deep learning training and inference) and full batch (determined by the number of queries in the dataset, see Datasets).