Skip to the content.

Redis Vector Search Optimization Guide

This guide describes best practices for optimizing vector similarity search performance in Redis on Intel Xeon processors. Redis 8.2+ includes SVS-VAMANA, a graph-based vector index with Intel’s compression technologies (LVQ and LeanVec) from Intel’s Scalable Vector Search (SVS) library.

Table of Contents

Overview

Redis Query Engine supports three vector index types: FLAT, HNSW, and SVS-VAMANA. SVS-VAMANA combines the Vamana graph-based search algorithm (Subramanya et al., NeurIPS 2019) with Intel’s compression technologies (LVQ and LeanVec), delivering optimal performance on servers with AVX-512 support.

Key Benefits of SVS-VAMANA:

SVS-VAMANA Configuration

Creating an SVS-VAMANA Index

FT.CREATE my_index
  ON HASH
  PREFIX 1 doc:
  SCHEMA embedding VECTOR SVS-VAMANA 12
    TYPE FLOAT32
    DIM 768
    DISTANCE_METRIC COSINE
    GRAPH_MAX_DEGREE 64
    CONSTRUCTION_WINDOW_SIZE 200
    COMPRESSION LVQ4x8

Index Parameters

Parameter Description Default Tuning Guidance
TYPE Vector data type (FLOAT16, FLOAT32) - FLOAT32 for accuracy, FLOAT16 for memory
DIM Vector dimensions - Must match your embeddings
DISTANCE_METRIC L2, IP, or COSINE - L2 for normalized embeddings
GRAPH_MAX_DEGREE Max edges per node 32 Equivalent to HNSW’s M × 2; higher = better recall, more memory
CONSTRUCTION_WINDOW_SIZE Build search window 200 Higher = better graph quality, slower build
SEARCH_WINDOW_SIZE Query search window 10 Higher = better recall, slower
REDUCE Target dimension for LeanVec DIM/2 Lower = faster search, may reduce recall
COMPRESSION LVQ/LeanVec type none See compression section
TRAINING_THRESHOLD Vectors for learning compression 10240 Increase if recall is low

Vector Compression

Intel SVS provides advanced compression techniques that reduce memory usage while maintaining search quality.

Compression Options

Compression Bits/Dim Memory Reduction Best For
None 32 (FLOAT32) 1x (baseline) Maximum accuracy
LVQ8 8 ~4x Fast ingestion, good balance
LVQ4x4 4+4 ~4x Fast search, dimensions < 768
LVQ4x8 4+8 ~2.5x High recall with compression
LeanVec4x8 4/f+8 ~3x High-dimensional vectors (768+)
LeanVec8x8 8/f+8 ~2.5x Best recall with LeanVec

The LeanVec dimensionality reduction factor f is the full dimensionality divided by the reduced dimensionality.

Choosing Compression by Use Case

Embedding Category Example Embeddings Compression Strategy
Text Embeddings Cohere embed-v3 (1024), OpenAI ada-002 (1536) LeanVec4x8
Image Embeddings ResNet-152 (2048), ViT (768+) LeanVec4x8
Multimodal CLIP ViT-B/32 (512) LVQ8
Lower Dimensional Custom embeddings (<768) LVQ4x4 or LVQ4x8

Example with LeanVec Compression

FT.CREATE my_index
  ON HASH
  PREFIX 1 doc:
  SCHEMA embedding VECTOR SVS-VAMANA 12
    TYPE FLOAT32
    DIM 1536
    DISTANCE_METRIC COSINE
    COMPRESSION LeanVec4x8
    REDUCE 384
    TRAINING_THRESHOLD 20000

Performance Tuning

Runtime Query Parameters

Adjust search parameters at query time for precision/performance trade-offs:

FT.SEARCH my_index
  "*=>[KNN 10 @embedding $BLOB SEARCH_WINDOW_SIZE $SW]"
  PARAMS 4 BLOB "\x12\xa9..." SW 50
  DIALECT 2
Parameter Effect Trade-off
SEARCH_WINDOW_SIZE Larger = higher recall Higher latency

Redis Configuration

# redis.conf optimizations for vector workloads

# Use multiple I/O threads for better throughput
io-threads 4
io-threads-do-reads yes

Benchmarks

Based on Redis benchmarking, SVS-VAMANA delivers significant improvements over HNSW:

Memory Savings

SVS-VAMANA with LVQ8 compression achieves consistent memory reductions across datasets (LVQ8 used as a common baseline; for Cohere and DBpedia embeddings, LeanVec is recommended in production — see Compression section):

Dataset Dimensions Total Memory Reduction Index Memory Reduction
LAION 512 26% 51%
Cohere 768 35% 70%
DBpedia 1536 37% 74%

Throughput Improvements (FP32)

At 0.95 precision, compared to HNSW:

Dataset Dimensions QPS Improvement
Cohere 768 Up to 144% higher
DBpedia 1536 Up to 60% higher
LAION 512 0-15% (marginal)

SVS-VAMANA is most effective at improving throughput for medium-to-high dimensional embeddings (768+ dimensions).

Latency Improvements (FP32, High Concurrency)

Dataset p50 Latency Reduction p95 Latency Reduction
Cohere (768d) 60% 57%
DBpedia (1536d) 46% 36%

Precision vs. Performance

At every precision point from ~0.92 to 0.99, SVS-VAMANA matches HNSW accuracy while delivering higher throughput. At high precision (0.99), SVS-VAMANA sustains up to 1.5x better throughput.

Ingestion Trade-offs

SVS-VAMANA index construction is slower than HNSW due to graph construction complexity and compression processing. On x86 platforms:

This trade-off is acceptable for workloads where query performance and memory efficiency are priorities.

FAQ

Q: When should I use SVS-VAMANA vs HNSW?

A: Use SVS-VAMANA when:

Use HNSW when:

Q: Are LVQ and LeanVec available in Redis Open Source?

A: The basic SVS-VAMANA algorithm with 8-bit scalar quantization (SQ8) is available in Redis Open Source on all platforms. Intel’s LVQ and LeanVec optimizations require:

On non-Intel platforms (AMD, ARM), SVS-VAMANA automatically falls back to SQ8 compression—no code changes required.

Q: What if recall is too low with compression?

A: Try these steps in order:

  1. Increase SEARCH_WINDOW_SIZE at query time
  2. For LeanVec, try a larger REDUCE value (closer to original dimensions)
  3. Increase TRAINING_THRESHOLD (e.g., 50000) if using LeanVec
  4. Switch to higher-bit compression (LVQ4x8 → LVQ8, or LeanVec4x8 → LeanVec8x8)
  5. Increase GRAPH_MAX_DEGREE (e.g., 64 or 128)

Q: Does SVS-VAMANA work on non-Intel hardware?

A: Yes! The API is unified and SVS-VAMANA runs on any x86 or ARM platform—no code changes needed. The library automatically selects the best available implementation:

Your application code stays the same regardless of hardware. Ideal performance is achieved on Intel Xeon with AVX-512, but you can deploy and test on any platform without modification.

References