Similarity Search Optimization Guides
This section contains optimization guides for vector similarity search workloads on Intel hardware. These guides help users of popular vector search solutions achieve optimal performance on Intel Xeon processors.
Overview
Vector similarity search is a core component of modern AI applications including:
- Retrieval-Augmented Generation (RAG)
- Semantic search
- Recommendation systems
- Image and video similarity
- Anomaly detection
Intel Scalable Vector Search (SVS)
Intel Scalable Vector Search (SVS) is a high-performance library for vector similarity search, optimized for Intel hardware. SVS can be used directly as a standalone library, and is integrated into popular solutions to bring these optimizations to a wider audience.
SVS features:
- Vamana Algorithm: Graph-based approximate nearest neighbor search
- Vector Compression: LVQ and LeanVec for significant memory reduction
- Hardware Optimization: Best performance on servers with AVX-512 support
Understanding LVQ and LeanVec Compression
Traditional vector compression methods face limitations in graph-based search. Product Quantization (PQ) requires keeping full-precision vectors for re-ranking, defeating compression benefits. Standard scalar quantization with global bounds fails to efficiently utilize available quantization levels.
LVQ (Locally-adaptive Vector Quantization)
LVQ addresses these limitations by applying per-vector normalization and scalar quantization, adapting the quantization bounds individually for each vector. This local adaptation ensures efficient use of the available bit range, resulting in high-quality compressed representations.
Key benefits:
- Minimal decompression overhead enables fast, on-the-fly distance computations
- Significantly reduces memory bandwidth and storage requirements
- Maintains high search accuracy and throughput
- SIMD-optimized layout (Turbo LVQ) for efficient distance computations
LVQ achieves a four-fold reduction of vector size while maintaining search accuracy. A typical 768-dimensional float32 vector requiring 3072 bytes can be reduced to just a few hundred bytes.
LeanVec (LVQ with Dimensionality Reduction)
LeanVec builds on LVQ by first applying linear dimensionality reduction, then compressing the reduced vectors with LVQ. This two-step approach significantly cuts memory and compute costs, enabling faster similarity search and index construction with minimal accuracy loss—especially effective for high-dimensional deep learning embeddings.
Best suited for:
- High-dimensional vectors (768+ dimensions)
- Text embeddings from large language models
- Cases where maximum memory savings are needed
Two-Level Compression
Both LVQ and LeanVec support two-level compression schemes:
- Level 1: Fast candidate retrieval using compressed vectors
- Level 2: Re-ranking for accuracy (LVQ encodes residuals, LeanVec encodes the full dimensionality data)
The naming convention reflects bits per dimension at each level:
LVQ4x8: 4 bits for Level 1, 8 bits for Level 2 (12 bits total per dimension)LVQ8: Single-level, 8 bits per dimensionLeanVec4x8: 4-bit Level 1 encoding of reduced dimensionality data + 8-bit Level 2 encoding of full dimensionality data
Vector Compression Selection
| Compression | Best For | Observations |
|---|---|---|
| LVQ4x4 | Fast search and low memory use | Consider LeanVec for even faster search |
| LeanVec4x8 | Fastest search and ingestion | LeanVec dimensionality reduction might reduce recall |
| LVQ4 | Maximum memory saving | Recall might be insufficient |
| LVQ8 | Faster ingestion than LVQ4x4 | Search likely slower than LVQ4x4 |
| LeanVec8x8 | Improved recall when LeanVec4x8 is insufficient | LeanVec dimensionality reduction might reduce recall |
| LVQ4x8 | Improved recall when LVQ4x4 is insufficient | Slightly worse memory savings |
Rule of thumb:
- Dimensions < 768 → Use LVQ (LVQ4x4, LVQ4x8, or LVQ8)
- Dimensions ≥ 768 → Use LeanVec (LeanVec4x8 or LeanVec8x8)
Available Guides
| Software | Description | Guide |
|---|---|---|
| Redis | Redis Query Engine with SVS-VAMANA | Redis Guide |