Vamana Iterator
The svs::VamanaIterator
class is designed to facilitate efficient and restartable searches within a Vamana Index. It dynamically adjusts the batch size during successive calls to svs::VamanaIterator::next()
, ensuring that the required number of neighbors is retrieved for each query. To optimize subsequent searches, the iterator leverages an additional search buffer capacity, enabling it to explore nodes beyond the previous batch’s search window while avoiding revisiting already retrieved nodes. This design ensures high performance and scalability for iterative neighbor searches.
Note that svs::VamanaIterator::next()
conducts search and prepares the results for the next batch of neigbors. After this, the resulting neighbors can be retrieved by calling svs::VamanaIterator::results()
. The iterator is designed to be used in a loop, where each iteration retrieves a new batch of neighbors.
One can directly use svs::Vamana::batch_iterator()
or svs::DynamicVamana::batch_iterator()
to create a batch iterator for these indices. This method is a convenience wrapper around the svs::VamanaIterator
constructor, which allows for the creation of an iterator for the index.
-
class VamanaIterator
Type-erased wrapper for the low-level Vamana iterator.
Public Functions
-
template<typename Index, typename QueryType>
inline VamanaIterator(const Index &parent, std::span<const QueryType> query, size_t extra_search_buffer_capacity = svs::UNSIGNED_INTEGER_PLACEHOLDER) Construct a new batch iterator for the query over the index.
Argument
extra_search_buffer_capacity
is the extra search buffer capacity to use for the next search. This is used to ensure that we have few extra neighbors in the search buffer to accommodate the next search (when not provided,svs::ITERATOR_EXTRA_BUFFER_CAPACITY_DEFAULT = 100
is used.
-
inline svs::index::vamana::VamanaSearchParameters parameters_for_current_iteration() const
Return the search parameters used for the current batch.
-
inline svs::DataType query_type() const
Return the element type of the captured query.
-
inline size_t batch_number() const
Return the current batch number.
-
inline size_t size() const
Return the number of results for the current batch.
-
inline std::span<const svs::Neighbor<size_t>> results() const
Return a span of the results for the current batch.
-
inline void next(size_t batch_size, const lib::DefaultPredicate &cancel = lib::Returns(lib::Const<false>()))
Prepare a new batch of results.
After calling this method, previous results will no longer be available. This method invalidates previous values return by
results()
.- Parameters:
batch_size – The number of results to return in the next batch. In some scenarios (like when all entries are returned or if search is cancelled), results size can be lower than the
batch_size
.cancel – A predicate called during the search to determine if the search should be cancelled.
-
inline void restart_next_search() const
Signal that the next batch search should begin entirely from scratch.
The iterator records some internal state to accelerate future calls to
next()
. This caching of results may yield slightly different results than beginning index search completely over from the original entry points.Calling this method signals the iterator to abandon its cached state.
This can be helpful for measuring performance and verifying recall values.
-
inline bool done() const
Returns whether iterator can find more neighbors or not for the given query.
The iterator is considered done when all the available nodes have been yielded or when the search can not find any more neighbors. The transition from not done to done will be triggered by a call to
next()
. The contents ofbatch_number()
andparameters_for_current_iteration()
will then remain unchanged by subsequent invocations ofnext()
.
-
template<typename Index, typename QueryType>
Example
As with the top level index, a brief example showing the usage of the iterator is provided. We begin with the usual includes:
// SVS Dependencies
#include "svs/orchestrators/vamana.h" // bulk of the dependencies required.
// Alternative main definition
#include "svsmain.h"
// stl
#include <string>
#include <string_view>
#include <vector>
A helper routine is defined to initialize a tiny index:
[[nodiscard]] auto make_example_index() {
// Build the index.
auto build_parameters = svs::index::vamana::VamanaBuildParameters{
1.2, // alpha
16, // graph_max_degree
32, // window_size
16, // max_candidate_pool_size
16, // prune_to
true // use_full_search_history
};
// Create a dataset with 7 elements, each with 4 dimensions.
// The contents of the dataset are
// {0, 0, 0, 0}
// {1, 1, 1, 1}
// ...
// {6, 6, 6, 6}
auto data = svs::data::SimpleData<float>(7, 4);
// Fill data `i` with all `i`s.
auto buffer = std::vector<float>();
for (size_t i = 0; i < data.size(); ++i) {
auto fi = svs::lib::narrow<float>(i);
buffer = {fi, fi, fi, fi};
data.set_datum(i, buffer);
}
// Build the index.
return svs::index::vamana::auto_build(
build_parameters, std::move(data), svs::DistanceL2{}, 1
);
}
This initializes the index into a known state, allowing this example to more clearly reason about the neighbors that should be returned.
Constructing an Iterator
A code snippet for constructing a batch iterator is shown below with explanation following.
auto index = make_example_index();
// Each iteration will yield 3 elements that have not been yielded previously.
size_t batchsize = 3;
// Create a batch iterator over the index for the query.
auto itr = [&]() {
// Construct a query a query in a scoped block to demonstrate that the iterator
// maintains an internal copy.
auto query = std::vector<float>{3.25, 3.25, 3.25, 3.25};
// Make a batch iterator for the query.
return svs::VamanaIterator(index, svs::lib::as_const_span(query));
}();
Iterator construction requires three items:
The index (in this case,
svs::Vamana
)A query
An optional parameter
extra_search_buffer_capacity
When constructing the iterator, we provide the index with the query and an optional extra search buffer capacity to allow the iterator to search beyond the current batch. On construction, the iterator does not conduct any searches, but rather prepares itself for the first search by setting the search buffer capacity to the provided extra_search_buffer_capacity
value (if not provided, svs::ITERATOR_EXTRA_BUFFER_CAPACITY_DEFAULT = 100
is used). The iterator will then search for neighbors in the index when the svs::VamanaIterator::next()
method is called.
Upon construction, search_window_size
is initialized to 0, and search_buffer_capacity
is set to extra_search_buffer_capacity
. With each next(batch_size)
call, both search_window_size
and search_buffer_capacity
are incremented by batch_size
before conducting the search. This ensures that the buffer capacity remains greater than the search window size by extra_search_buffer_capacity
, facilitating efficient searches for subsequent batches without revisiting nodes that have already been retrieved.
The query is copied into the iterator and thus the query provided at the callsite need not outlive the iterator. This is demonstrated by constructing the query as a temporary variable in the lambda.
Using the Iterator
itr.next(batchsize);
CHECK(itr.size() == 3);
CHECK(itr.batch_number() == 1);
// Obtain a view of the current list candidates.
std::span<const svs::Neighbor<size_t>> results = itr.results();
CHECK(results.size() == 3);
// We constructed the dataset in such a way that we know what the results should be.
fmt::print("Neighbor 0 = {}\n", results[0].id());
fmt::print("Neighbor 1 = {}\n", results[1].id());
fmt::print("Neighbor 2 = {}\n", results[2].id());
CHECK(!itr.done());
CHECK(results[0].id() == 3);
CHECK(results[1].id() == 4);
CHECK(results[2].id() == 2);
Calling to svs::VamanaIterator::next()
will cause the iterator to search and prepare batchsize
number of neighbors in the index. These resulting neighbors can be retrieved with a call to svs::VamanaIterator::results()
.
This number of neighbors retrieved by the last next()
call can be seen by calling the svs::VamanaIterator::size()
method. The svs::VamanaIterator::batch_number()
method returns the number of batches of neighbors retrieved by next
calls so far.
The svs::VamanaIterator::results()
method will provide a view of the current candidates.
In the example, we purposely constructed the indexed dataset such that we can manually inspect the returned
results.
The neighbors returned by the batch iterator will be weakly monotonic, in the sense that earlier neighbors on earlier batches should generally be nearer to the query.
Stepping the Iterator
// This will yield the next batch of neighbors.
itr.next(batchsize);
CHECK(itr.size() == 3);
// The contents of the iterator are for batch 2.
CHECK(itr.batch_number() == 2);
// Update and inspect the results.
results = itr.results();
fmt::print("Neighbor 3 = {}\n", results[0].id());
fmt::print("Neighbor 4 = {}\n", results[1].id());
fmt::print("Neighbor 5 = {}\n", results[2].id());
CHECK(!itr.done());
CHECK(results[0].id() == 5);
CHECK(results[1].id() == 1);
CHECK(results[2].id() == 6);
If more candidates are required, the svs::VamanaIterator::next()
method can be invoked.
Each call to svs::VamanaIterator::next()
will yield a new batch of candidates, up to the provided argument value batchsize
.
The iterator uses some cached internal state to make future searches more efficient than starting from scratch.
Note
Calls to svs::VamanaIterator::next()
renders previous results unavailable and invalidates previous values returned by svs::VamanaIterator::results()
.
Exhausting the Candidate Pool
// So far, the iterator has yielded 6 of the 7 vectors in the dataset.
// This call to `next()` should only yield a single neighbor - the last on in the index.
itr.next(batchsize);
CHECK(itr.size() == 1);
CHECK(itr.done());
results = itr.results();
fmt::print("Neighbor 6 = {}\n", results[0].id());
CHECK(results[0].id() == 0);
When the iterator is close to exhausting all elements in the index, it may yield fewer then the configured batch-size, as is shown in the example.
After this batch completes, there are no more candidates in the index that haven’t been yielded.
The method svs::VamanaIterator::done()
will then return true.
In this state, future calls to svs::VamanaIterator::next()
will simply yield no candidates.
// Calling `next()` again should yield no more candidates.
itr.next(batchsize);
CHECK(itr.size() == 0);
CHECK(itr.done());
The full xample is provided at the bottom of this page.
Main BatchIterator Class
-
template<typename Index, typename QueryType>
class BatchIterator A batch iterator for retrieving neighbors from the index in batches.
This iterator abstracts the process of retrieving neighbors in fixed-size batches while maintaining internal state for efficient graph traversal.
Public Types
-
using iterator = typename result_buffer_type::iterator
Random-access iterator to
value_type
over the current batch of results.
-
using const_iterator = typename result_buffer_type::const_iterator
Random-access iterator to
const value_type
over the current batch of results.
Public Functions
-
inline BatchIterator(const Index &parent, std::span<const QueryType> query, size_t extra_search_buffer_capacity = svs::UNSIGNED_INTEGER_PLACEHOLDER)
Constructs a batch iterator for the given query over the index.
- Parameters:
parent – The index to search.
query – The query data.
extra_search_buffer_capacity – Additional buffer capacity for the search. When not provided,
svs::ITERATOR_EXTRA_BUFFER_CAPACITY_DEFAULT = 100
is used.
-
inline void update(std::span<const QueryType> newquery)
Updates the iterator with a new query. Resets the internal state and restarts the search when
next(...)
is called.
-
template<NeighborLike N>
inline svs::Neighbor<external_id_type> adapt(N internal) const Adapts an internal neighbor to an external neighbor.
-
inline const_iterator begin() const
Returns an iterator to the beginning of the results.
-
inline const_iterator end() const
Returns an iterator to the end of the results.
-
inline const_iterator cbegin() const
Returns an iterator to the beginning of the results.
-
inline const_iterator cend() const
Returns an iterator to the beginning of the results.
-
inline std::span<const value_type> contents() const
Returns a span over the current batch of neighbors. The span is invalidated by calls to
next(...)
.
-
inline size_t size() const
Returns the number of buffered results.
-
inline size_t batch_number() const
Return the batch number corresponding to the current buffer.
-
inline bool done() const
Returns whether iterator can find more neighbors or not for the given query.
The iterator is considered done when all the available nodes have been yielded or when the search can not find any more neighbors. The transition from not done to done will be triggered by a call to
next()
. The contents ofbatch_number()
andparameters_for_current_iteration()
will then remain unchanged by subsequent invocations ofnext()
.
-
inline void restart_next_search()
Forces the next iteration to restart the search from scratch.
-
inline vamana::VamanaSearchParameters parameters_for_current_iteration() const
Returns the search parameters used for the current batch.
-
inline void next(size_t batch_size, const lib::DefaultPredicate &cancel = lib::Returns(lib::Const<false>()))
Prepares the next batch of neighbors (up to
batch_size
) from the index. Handles exceptions gracefully and ensures iterator state consistency.
-
using iterator = typename result_buffer_type::iterator
Full Example
//! [Includes]
// SVS Dependencies
#include "svs/orchestrators/vamana.h" // bulk of the dependencies required.
// Alternative main definition
#include "svsmain.h"
// stl
#include <string>
#include <string_view>
#include <vector>
//! [Includes]
void check(bool value, std::string_view expr, svs::lib::detail::LineInfo linfo) {
if (!value) {
throw svs::lib::ANNException(
"expression \"{}\" evaluated to false in {}", expr, linfo
);
}
}
#define CHECK(expr) check(expr, #expr, SVS_LINEINFO);
//! [Example Index Construction]
[[nodiscard]] auto make_example_index() {
// Build the index.
auto build_parameters = svs::index::vamana::VamanaBuildParameters{
1.2, // alpha
16, // graph_max_degree
32, // window_size
16, // max_candidate_pool_size
16, // prune_to
true // use_full_search_history
};
// Create a dataset with 7 elements, each with 4 dimensions.
// The contents of the dataset are
// {0, 0, 0, 0}
// {1, 1, 1, 1}
// ...
// {6, 6, 6, 6}
auto data = svs::data::SimpleData<float>(7, 4);
// Fill data `i` with all `i`s.
auto buffer = std::vector<float>();
for (size_t i = 0; i < data.size(); ++i) {
auto fi = svs::lib::narrow<float>(i);
buffer = {fi, fi, fi, fi};
data.set_datum(i, buffer);
}
// Build the index.
return svs::index::vamana::auto_build(
build_parameters, std::move(data), svs::DistanceL2{}, 1
);
}
//! [Example Index Construction]
void demonstrate_iterator() {
//! [Setup]
auto index = make_example_index();
// Each iteration will yield 3 elements that have not been yielded previously.
size_t batchsize = 3;
// Create a batch iterator over the index for the query.
auto itr = [&]() {
// Construct a query a query in a scoped block to demonstrate that the iterator
// maintains an internal copy.
auto query = std::vector<float>{3.25, 3.25, 3.25, 3.25};
// Make a batch iterator for the query.
return svs::VamanaIterator(index, svs::lib::as_const_span(query));
}();
//! [Setup]
//! [First Iteration]
itr.next(batchsize);
CHECK(itr.size() == 3);
CHECK(itr.batch_number() == 1);
// Obtain a view of the current list candidates.
std::span<const svs::Neighbor<size_t>> results = itr.results();
CHECK(results.size() == 3);
// We constructed the dataset in such a way that we know what the results should be.
fmt::print("Neighbor 0 = {}\n", results[0].id());
fmt::print("Neighbor 1 = {}\n", results[1].id());
fmt::print("Neighbor 2 = {}\n", results[2].id());
CHECK(!itr.done());
CHECK(results[0].id() == 3);
CHECK(results[1].id() == 4);
CHECK(results[2].id() == 2);
//! [First Iteration]
//! [Next Iteration]
// This will yield the next batch of neighbors.
itr.next(batchsize);
CHECK(itr.size() == 3);
// The contents of the iterator are for batch 2.
CHECK(itr.batch_number() == 2);
// Update and inspect the results.
results = itr.results();
fmt::print("Neighbor 3 = {}\n", results[0].id());
fmt::print("Neighbor 4 = {}\n", results[1].id());
fmt::print("Neighbor 5 = {}\n", results[2].id());
CHECK(!itr.done());
CHECK(results[0].id() == 5);
CHECK(results[1].id() == 1);
CHECK(results[2].id() == 6);
//! [Next Iteration]
//! [Final Iteration]
// So far, the iterator has yielded 6 of the 7 vectors in the dataset.
// This call to `next()` should only yield a single neighbor - the last on in the index.
itr.next(batchsize);
CHECK(itr.size() == 1);
CHECK(itr.done());
results = itr.results();
fmt::print("Neighbor 6 = {}\n", results[0].id());
CHECK(results[0].id() == 0);
//! [Final Iteration]
//! [Beyond Final Iteration]
// Calling `next()` again should yield no more candidates.
itr.next(batchsize);
CHECK(itr.size() == 0);
CHECK(itr.done());
//! [Beyond Final Iteration]
}
// Alternative main definition
int svs_main(std::vector<std::string> SVS_UNUSED(args)) {
demonstrate_iterator();
return 0;
}
SVS_DEFINE_MAIN();