Experimental Setup

Datasets

To cover a wide range of use cases, we evaluate SVS on standard datasets of diverse dimensionalities (\(d=25\) to \(d=768\)), number of elements (\(n=10^6\) to \(n=10^9\)), data types and metrics as described in the table below.

Dataset

d

n

Encoding

Similarity

n queries

Space (GiB)

gist-960-1M

960

1M

float32

L2

1000

3.6

sift-128-1M

128

1M

float32

L2

10000

0.5

deep-96-10M

96

10M

float32

cosine similarity

10000

3.6

glove-50-1.2M

50

1.2M

float32

cosine similarity

10000

0.2

glove-25-1.2M

25

1.2M

float32

cosine similarity

10000

0.1

t2i-200-100M

200

100M

float32

inner product

10000

74.5

deep-96-100M

96

100M

float32

cosine similarity

10000

35.8

deep-96-1B

96

1B

float32

cosine similarity

10000

257.6

sift-128-1B

128

1B

uint8

L2

10000

119.2

Evaluation Metrics

In all benchmarks and experimental results, search accuracy is measured by k-recall at k, defined by \(| S \cap G_t | / k\), where \(S\) are the ids of the \(k\) retrieved neighbors and \(G_t\) is the ground-truth. We use \(k=10\) in all experiments. Search performance is measured by queries per second (QPS).

Query Batch Size

The size of the query batch, which will depend on the use case, can have a big impact on performance. Therefore, we evaluate batch sizes: 1 (one query at a time or single query), 128 (typical batch size for deep learning training and inference) and full batch (determined by the number of queries in the dataset, see Datasets).