Redis Vector Search Optimization Guide

This guide describes best practices for optimizing vector similarity search performance in Redis on Intel Xeon processors. Redis 8.2+ includes SVS-VAMANA, a graph-based vector index with Intel’s compression technologies (LVQ and LeanVec) from Intel’s Scalable Vector Search (SVS) library.

Overview
SVS-VAMANA Configuration
Vector Compression
Performance Tuning
Benchmarks
FAQ
References

Overview

Redis Query Engine supports three vector index types: FLAT, HNSW, and SVS-VAMANA. SVS-VAMANA combines the Vamana graph-based search algorithm (Subramanya et al., NeurIPS 2019) with Intel’s compression technologies (LVQ and LeanVec), delivering optimal performance on servers with AVX-512 support.

Key Benefits of SVS-VAMANA:

Memory Efficiency: 26–37% total memory savings compared to HNSW, with 51–74% reduction in index memory
Higher Throughput: Up to 144% higher QPS compared to HNSW on high-dimensional datasets
Lower Latency: Up to 60% reduction in p50/p95 latencies under load
Maintained Accuracy: Matches HNSW precision levels while delivering performance improvements

SVS-VAMANA Configuration

Creating an SVS-VAMANA Index

FT.CREATE my_index
  ON HASH
  PREFIX 1 doc:
  SCHEMA embedding VECTOR SVS-VAMANA 12
    TYPE FLOAT32
    DIM 768
    DISTANCE_METRIC COSINE
    GRAPH_MAX_DEGREE 64
    CONSTRUCTION_WINDOW_SIZE 200
    COMPRESSION LVQ4x8

Index Parameters

Parameter	Description	Default	Tuning Guidance
TYPE	Vector data type (FLOAT16, FLOAT32)	-	FLOAT32 for accuracy, FLOAT16 for memory
DIM	Vector dimensions	-	Must match your embeddings
DISTANCE_METRIC	L2, IP, or COSINE	-	L2 for normalized embeddings
GRAPH_MAX_DEGREE	Max edges per node	32	Equivalent to HNSW’s M × 2; higher = better recall, more memory
CONSTRUCTION_WINDOW_SIZE	Build search window	200	Higher = better graph quality, slower build
SEARCH_WINDOW_SIZE	Query search window	10	Higher = better recall, slower
REDUCE	Target dimension for LeanVec	DIM/2	Lower = faster search, may reduce recall
COMPRESSION	LVQ/LeanVec type	none	See compression section
TRAINING_THRESHOLD	Vectors for learning compression	10240	Increase if recall is low

Vector Compression

Intel SVS provides advanced compression techniques that reduce memory usage while maintaining search quality.

Compression Options

Compression	Bits/Dim	Memory Reduction	Best For
None	32 (FLOAT32)	1x (baseline)	Maximum accuracy
LVQ8	8	~4x	Fast ingestion, good balance
LVQ4x4	4+4	~4x	Fast search, dimensions < 768
LVQ4x8	4+8	~2.5x	High recall with compression
LeanVec4x8	4/f+8	~3x	High-dimensional vectors (768+)
LeanVec8x8	8/f+8	~2.5x	Best recall with LeanVec

The LeanVec dimensionality reduction factor f is the full dimensionality divided by the reduced dimensionality.

Choosing Compression by Use Case

Embedding Category	Example Embeddings	Compression Strategy
Text Embeddings	Cohere embed-v3 (1024), OpenAI ada-002 (1536)	LeanVec4x8
Image Embeddings	ResNet-152 (2048), ViT (768+)	LeanVec4x8
Multimodal	CLIP ViT-B/32 (512)	LVQ8
Lower Dimensional	Custom embeddings (<768)	LVQ4x4 or LVQ4x8

Example with LeanVec Compression

FT.CREATE my_index
  ON HASH
  PREFIX 1 doc:
  SCHEMA embedding VECTOR SVS-VAMANA 12
    TYPE FLOAT32
    DIM 1536
    DISTANCE_METRIC COSINE
    COMPRESSION LeanVec4x8
    REDUCE 384
    TRAINING_THRESHOLD 20000

Performance Tuning

Runtime Query Parameters

Adjust search parameters at query time for precision/performance trade-offs:

FT.SEARCH my_index
  "*=>[KNN 10 @embedding $BLOB SEARCH_WINDOW_SIZE $SW]"
  PARAMS 4 BLOB "\x12\xa9..." SW 50
  DIALECT 2

Parameter	Effect	Trade-off
SEARCH_WINDOW_SIZE	Larger = higher recall	Higher latency

Redis Configuration

# redis.conf optimizations for vector workloads

# Use multiple I/O threads for better throughput
io-threads 4
io-threads-do-reads yes

Benchmarks

Based on Redis benchmarking, SVS-VAMANA delivers significant improvements over HNSW:

Memory Savings

SVS-VAMANA with LVQ8 compression achieves consistent memory reductions across datasets (LVQ8 used as a common baseline; for Cohere and DBpedia embeddings, LeanVec is recommended in production — see Compression section):

Dataset	Dimensions	Total Memory Reduction	Index Memory Reduction
LAION	512	26%	51%
Cohere	768	35%	70%
DBpedia	1536	37%	74%

Throughput Improvements (FP32)

At 0.95 precision, compared to HNSW:

Dataset	Dimensions	QPS Improvement
Cohere	768	Up to 144% higher
DBpedia	1536	Up to 60% higher
LAION	512	0-15% (marginal)

SVS-VAMANA is most effective at improving throughput for medium-to-high dimensional embeddings (768+ dimensions).

Latency Improvements (FP32, High Concurrency)

Dataset	p50 Latency Reduction	p95 Latency Reduction
Cohere (768d)	60%	57%
DBpedia (1536d)	46%	36%

Precision vs. Performance

At every precision point from ~0.92 to 0.99, SVS-VAMANA matches HNSW accuracy while delivering higher throughput. At high precision (0.99), SVS-VAMANA sustains up to 1.5x better throughput.

Ingestion Trade-offs

SVS-VAMANA index construction is slower than HNSW due to graph construction complexity and compression processing. On x86 platforms:

LeanVec: Can be up to 25% faster or 33% slower than HNSW depending on dataset
LVQ: Up to 2.6x slower than HNSW

This trade-off is acceptable for workloads where query performance and memory efficiency are priorities.

FAQ

Q: When should I use SVS-VAMANA vs HNSW?

A: Use SVS-VAMANA when:

Running on Intel Xeon processors with AVX-512
Memory efficiency is important (26-37% savings)
You have medium-to-high dimensional vectors (768+)
Query throughput and latency are priorities

Use HNSW when:

Running on ARM platforms (HNSW offers both faster search and faster ingestion on ARM)
Working with lower-dimensional vectors (<512) and needing faster index construction

Q: Are LVQ and LeanVec available in Redis Open Source?

A: The basic SVS-VAMANA algorithm with 8-bit scalar quantization (SQ8) is available in Redis Open Source on all platforms. Intel’s LVQ and LeanVec optimizations require:

Intel hardware with AVX-512
Redis Software (commercial) or building Redis Open Source with BUILD_INTEL_SVS_OPT=yes

On non-Intel platforms (AMD, ARM), SVS-VAMANA automatically falls back to SQ8 compression—no code changes required.

Q: What if recall is too low with compression?

A: Try these steps in order:

Increase SEARCH_WINDOW_SIZE at query time
For LeanVec, try a larger REDUCE value (closer to original dimensions)
Increase TRAINING_THRESHOLD (e.g., 50000) if using LeanVec
Switch to higher-bit compression (LVQ4x8 → LVQ8, or LeanVec4x8 → LeanVec8x8)
Increase GRAPH_MAX_DEGREE (e.g., 64 or 128)

Q: Does SVS-VAMANA work on non-Intel hardware?

A: Yes! The API is unified and SVS-VAMANA runs on any x86 or ARM platform—no code changes needed. The library automatically selects the best available implementation:

Intel (AVX-512): Full LVQ/LeanVec optimizations for maximum performance
AMD/Other x86: SQ8 fallback implementation, which benchmarks show is also quite fast—often comparable performance
ARM: SQ8 fallback works; however, HNSW may be preferable due to slower SVS ingestion on ARM

Your application code stays the same regardless of hardware. Ideal performance is achieved on Intel Xeon with AVX-512, but you can deploy and test on any platform without modification.

Redis Vector Search Optimization Guide

Data Center workload and software optimizations for Intel hardware.

Redis Vector Search Optimization Guide

Table of Contents

Overview

SVS-VAMANA Configuration

Creating an SVS-VAMANA Index

Index Parameters

Vector Compression

Compression Options

Choosing Compression by Use Case

Example with LeanVec Compression

Performance Tuning

Runtime Query Parameters

Redis Configuration

Benchmarks

Memory Savings

Throughput Improvements (FP32)

Latency Improvements (FP32, High Concurrency)

Precision vs. Performance

Ingestion Trade-offs

FAQ

Q: When should I use SVS-VAMANA vs HNSW?

Q: Are LVQ and LeanVec available in Redis Open Source?

Q: What if recall is too low with compression?

Q: Does SVS-VAMANA work on non-Intel hardware?

References