Vamana Iterator

The svs::VamanaIterator class is designed to facilitate efficient and restartable searches within a Vamana Index. It dynamically adjusts the batch size during successive calls to svs::VamanaIterator::next(), ensuring that the required number of neighbors is retrieved for each query. To optimize subsequent searches, the iterator leverages an additional search buffer capacity, enabling it to explore nodes beyond the previous batch’s search window while avoiding revisiting already retrieved nodes. This design ensures high performance and scalability for iterative neighbor searches. Note that svs::VamanaIterator::next() conducts search and prepares the results for the next batch of neigbors. After this, the resulting neighbors can be retrieved by calling svs::VamanaIterator::results(). The iterator is designed to be used in a loop, where each iteration retrieves a new batch of neighbors.

One can directly use svs::Vamana::batch_iterator() or svs::DynamicVamana::batch_iterator() to create a batch iterator for these indices. This method is a convenience wrapper around the svs::VamanaIterator constructor, which allows for the creation of an iterator for the index.

class VamanaIterator

Type-erased wrapper for the low-level Vamana iterator.

Public Functions

template<typename Index, typename QueryType>
inline VamanaIterator(const Index &parent, std::span<const QueryType> query, size_t extra_search_buffer_capacity = svs::UNSIGNED_INTEGER_PLACEHOLDER)

Construct a new batch iterator for the query over the index.

Argument extra_search_buffer_capacity is the extra search buffer capacity to use for the next search. This is used to ensure that we have few extra neighbors in the search buffer to accommodate the next search (when not provided, svs::ITERATOR_EXTRA_BUFFER_CAPACITY_DEFAULT = 100 is used.

inline svs::index::vamana::VamanaSearchParameters parameters_for_current_iteration() const

Return the search parameters used for the current batch.

inline svs::DataType query_type() const

Return the element type of the captured query.

inline size_t batch_number() const

Return the current batch number.

inline size_t size() const

Return the number of results for the current batch.

inline std::span<const svs::Neighbor<size_t>> results() const

Return a span of the results for the current batch.

inline void next(size_t batch_size, const lib::DefaultPredicate &cancel = lib::Returns(lib::Const<false>()))

Prepare a new batch of results.

After calling this method, previous results will no longer be available. This method invalidates previous values return by results().

Parameters:
  • batch_size – The number of results to return in the next batch. In some scenarios (like when all entries are returned or if search is cancelled), results size can be lower than the batch_size.

  • cancel – A predicate called during the search to determine if the search should be cancelled.

inline void restart_next_search() const

Signal that the next batch search should begin entirely from scratch.

The iterator records some internal state to accelerate future calls to next(). This caching of results may yield slightly different results than beginning index search completely over from the original entry points.

Calling this method signals the iterator to abandon its cached state.

This can be helpful for measuring performance and verifying recall values.

inline bool done() const

Returns whether iterator can find more neighbors or not for the given query.

The iterator is considered done when all the available nodes have been yielded or when the search can not find any more neighbors. The transition from not done to done will be triggered by a call to next(). The contents of batch_number() and parameters_for_current_iteration() will then remain unchanged by subsequent invocations of next().

template<typename QueryType>
inline void update(std::span<const QueryType> newquery)

Update the iterator with a new query.

Example

As with the top level index, a brief example showing the usage of the iterator is provided. We begin with the usual includes:

// SVS Dependencies
#include "svs/orchestrators/vamana.h" // bulk of the dependencies required.

// Alternative main definition
#include "svsmain.h"

// stl
#include <string>
#include <string_view>
#include <vector>

A helper routine is defined to initialize a tiny index:

[[nodiscard]] auto make_example_index() {
    // Build the index.
    auto build_parameters = svs::index::vamana::VamanaBuildParameters{
        1.2, // alpha
        16,  // graph_max_degree
        32,  // window_size
        16,  // max_candidate_pool_size
        16,  // prune_to
        true // use_full_search_history
    };

    // Create a dataset with 7 elements, each with 4 dimensions.
    // The contents of the dataset are
    // {0, 0, 0, 0}
    // {1, 1, 1, 1}
    // ...
    // {6, 6, 6, 6}
    auto data = svs::data::SimpleData<float>(7, 4);

    // Fill data `i` with all `i`s.
    auto buffer = std::vector<float>();
    for (size_t i = 0; i < data.size(); ++i) {
        auto fi = svs::lib::narrow<float>(i);
        buffer = {fi, fi, fi, fi};
        data.set_datum(i, buffer);
    }

    // Build the index.
    return svs::index::vamana::auto_build(
        build_parameters, std::move(data), svs::DistanceL2{}, 1
    );
}

This initializes the index into a known state, allowing this example to more clearly reason about the neighbors that should be returned.

Constructing an Iterator

A code snippet for constructing a batch iterator is shown below with explanation following.

auto index = make_example_index();

// Each iteration will yield 3 elements that have not been yielded previously.
size_t batchsize = 3;

// Create a batch iterator over the index for the query.
auto itr = [&]() {
    // Construct a query a query in a scoped block to demonstrate that the iterator
    // maintains an internal copy.
    auto query = std::vector<float>{3.25, 3.25, 3.25, 3.25};

    // Make a batch iterator for the query.
    return svs::VamanaIterator(index, svs::lib::as_const_span(query));
}();

Iterator construction requires three items:

  • The index (in this case, svs::Vamana)

  • A query

  • An optional parameter extra_search_buffer_capacity

When constructing the iterator, we provide the index with the query and an optional extra search buffer capacity to allow the iterator to search beyond the current batch. On construction, the iterator does not conduct any searches, but rather prepares itself for the first search by setting the search buffer capacity to the provided extra_search_buffer_capacity value (if not provided, svs::ITERATOR_EXTRA_BUFFER_CAPACITY_DEFAULT = 100 is used). The iterator will then search for neighbors in the index when the svs::VamanaIterator::next() method is called.

Upon construction, search_window_size is initialized to 0, and search_buffer_capacity is set to extra_search_buffer_capacity. With each next(batch_size) call, both search_window_size and search_buffer_capacity are incremented by batch_size before conducting the search. This ensures that the buffer capacity remains greater than the search window size by extra_search_buffer_capacity, facilitating efficient searches for subsequent batches without revisiting nodes that have already been retrieved.

The query is copied into the iterator and thus the query provided at the callsite need not outlive the iterator. This is demonstrated by constructing the query as a temporary variable in the lambda.

Using the Iterator

itr.next(batchsize);
CHECK(itr.size() == 3);
CHECK(itr.batch_number() == 1);

// Obtain a view of the current list candidates.
std::span<const svs::Neighbor<size_t>> results = itr.results();
CHECK(results.size() == 3);

// We constructed the dataset in such a way that we know what the results should be.
fmt::print("Neighbor 0 = {}\n", results[0].id());
fmt::print("Neighbor 1 = {}\n", results[1].id());
fmt::print("Neighbor 2 = {}\n", results[2].id());
CHECK(!itr.done());
CHECK(results[0].id() == 3);
CHECK(results[1].id() == 4);
CHECK(results[2].id() == 2);

Calling to svs::VamanaIterator::next() will cause the iterator to search and prepare batchsize number of neighbors in the index. These resulting neighbors can be retrieved with a call to svs::VamanaIterator::results(). This number of neighbors retrieved by the last next() call can be seen by calling the svs::VamanaIterator::size() method. The svs::VamanaIterator::batch_number() method returns the number of batches of neighbors retrieved by next calls so far.

The svs::VamanaIterator::results() method will provide a view of the current candidates. In the example, we purposely constructed the indexed dataset such that we can manually inspect the returned results.

The neighbors returned by the batch iterator will be weakly monotonic, in the sense that earlier neighbors on earlier batches should generally be nearer to the query.

Stepping the Iterator

// This will yield the next batch of neighbors.
itr.next(batchsize);
CHECK(itr.size() == 3);

// The contents of the iterator are for batch 2.
CHECK(itr.batch_number() == 2);

// Update and inspect the results.
results = itr.results();
fmt::print("Neighbor 3 = {}\n", results[0].id());
fmt::print("Neighbor 4 = {}\n", results[1].id());
fmt::print("Neighbor 5 = {}\n", results[2].id());
CHECK(!itr.done());
CHECK(results[0].id() == 5);
CHECK(results[1].id() == 1);
CHECK(results[2].id() == 6);

If more candidates are required, the svs::VamanaIterator::next() method can be invoked. Each call to svs::VamanaIterator::next() will yield a new batch of candidates, up to the provided argument value batchsize. The iterator uses some cached internal state to make future searches more efficient than starting from scratch.

Note

Calls to svs::VamanaIterator::next() renders previous results unavailable and invalidates previous values returned by svs::VamanaIterator::results().

Exhausting the Candidate Pool

// So far, the iterator has yielded 6 of the 7 vectors in the dataset.
// This call to `next()` should only yield a single neighbor - the last on in the index.
itr.next(batchsize);
CHECK(itr.size() == 1);
CHECK(itr.done());
results = itr.results();
fmt::print("Neighbor 6 = {}\n", results[0].id());
CHECK(results[0].id() == 0);

When the iterator is close to exhausting all elements in the index, it may yield fewer then the configured batch-size, as is shown in the example. After this batch completes, there are no more candidates in the index that haven’t been yielded. The method svs::VamanaIterator::done() will then return true. In this state, future calls to svs::VamanaIterator::next() will simply yield no candidates.

// Calling `next()` again should yield no more candidates.
itr.next(batchsize);
CHECK(itr.size() == 0);
CHECK(itr.done());

The full xample is provided at the bottom of this page.

Main BatchIterator Class

template<typename Index, typename QueryType>
class BatchIterator

A batch iterator for retrieving neighbors from the index in batches.

This iterator abstracts the process of retrieving neighbors in fixed-size batches while maintaining internal state for efficient graph traversal.

Public Types

using iterator = typename result_buffer_type::iterator

Random-access iterator to value_type over the current batch of results.

using const_iterator = typename result_buffer_type::const_iterator

Random-access iterator to const value_type over the current batch of results.

Public Functions

inline BatchIterator(const Index &parent, std::span<const QueryType> query, size_t extra_search_buffer_capacity = svs::UNSIGNED_INTEGER_PLACEHOLDER)

Constructs a batch iterator for the given query over the index.

Parameters:
  • parent – The index to search.

  • query – The query data.

  • extra_search_buffer_capacity – Additional buffer capacity for the search. When not provided, svs::ITERATOR_EXTRA_BUFFER_CAPACITY_DEFAULT = 100 is used.

inline void update(std::span<const QueryType> newquery)

Updates the iterator with a new query. Resets the internal state and restarts the search when next(...) is called.

template<NeighborLike N>
inline svs::Neighbor<external_id_type> adapt(N internal) const

Adapts an internal neighbor to an external neighbor.

inline iterator begin()

Returns an iterator to the beginning of the results.

inline iterator end()

Returns an iterator to the end of the results.

inline const_iterator begin() const

Returns an iterator to the beginning of the results.

inline const_iterator end() const

Returns an iterator to the end of the results.

inline const_iterator cbegin() const

Returns an iterator to the beginning of the results.

inline const_iterator cend() const

Returns an iterator to the beginning of the results.

inline std::span<const value_type> contents() const

Returns a span over the current batch of neighbors. The span is invalidated by calls to next(...).

inline size_t size() const

Returns the number of buffered results.

inline size_t batch_number() const

Return the batch number corresponding to the current buffer.

inline bool done() const

Returns whether iterator can find more neighbors or not for the given query.

The iterator is considered done when all the available nodes have been yielded or when the search can not find any more neighbors. The transition from not done to done will be triggered by a call to next(). The contents of batch_number() and parameters_for_current_iteration() will then remain unchanged by subsequent invocations of next().

inline void restart_next_search()

Forces the next iteration to restart the search from scratch.

inline vamana::VamanaSearchParameters parameters_for_current_iteration() const

Returns the search parameters used for the current batch.

inline void next(size_t batch_size, const lib::DefaultPredicate &cancel = lib::Returns(lib::Const<false>()))

Prepares the next batch of neighbors (up to batch_size) from the index. Handles exceptions gracefully and ensures iterator state consistency.

Full Example

//! [Includes]
// SVS Dependencies
#include "svs/orchestrators/vamana.h" // bulk of the dependencies required.

// Alternative main definition
#include "svsmain.h"

// stl
#include <string>
#include <string_view>
#include <vector>
//! [Includes]

void check(bool value, std::string_view expr, svs::lib::detail::LineInfo linfo) {
    if (!value) {
        throw svs::lib::ANNException(
            "expression \"{}\" evaluated to false in {}", expr, linfo
        );
    }
}

#define CHECK(expr) check(expr, #expr, SVS_LINEINFO);

//! [Example Index Construction]
[[nodiscard]] auto make_example_index() {
    // Build the index.
    auto build_parameters = svs::index::vamana::VamanaBuildParameters{
        1.2, // alpha
        16,  // graph_max_degree
        32,  // window_size
        16,  // max_candidate_pool_size
        16,  // prune_to
        true // use_full_search_history
    };

    // Create a dataset with 7 elements, each with 4 dimensions.
    // The contents of the dataset are
    // {0, 0, 0, 0}
    // {1, 1, 1, 1}
    // ...
    // {6, 6, 6, 6}
    auto data = svs::data::SimpleData<float>(7, 4);

    // Fill data `i` with all `i`s.
    auto buffer = std::vector<float>();
    for (size_t i = 0; i < data.size(); ++i) {
        auto fi = svs::lib::narrow<float>(i);
        buffer = {fi, fi, fi, fi};
        data.set_datum(i, buffer);
    }

    // Build the index.
    return svs::index::vamana::auto_build(
        build_parameters, std::move(data), svs::DistanceL2{}, 1
    );
}
//! [Example Index Construction]

void demonstrate_iterator() {
    //! [Setup]
    auto index = make_example_index();

    // Each iteration will yield 3 elements that have not been yielded previously.
    size_t batchsize = 3;

    // Create a batch iterator over the index for the query.
    auto itr = [&]() {
        // Construct a query a query in a scoped block to demonstrate that the iterator
        // maintains an internal copy.
        auto query = std::vector<float>{3.25, 3.25, 3.25, 3.25};

        // Make a batch iterator for the query.
        return svs::VamanaIterator(index, svs::lib::as_const_span(query));
    }();
    //! [Setup]

    //! [First Iteration]
    itr.next(batchsize);
    CHECK(itr.size() == 3);
    CHECK(itr.batch_number() == 1);

    // Obtain a view of the current list candidates.
    std::span<const svs::Neighbor<size_t>> results = itr.results();
    CHECK(results.size() == 3);

    // We constructed the dataset in such a way that we know what the results should be.
    fmt::print("Neighbor 0 = {}\n", results[0].id());
    fmt::print("Neighbor 1 = {}\n", results[1].id());
    fmt::print("Neighbor 2 = {}\n", results[2].id());
    CHECK(!itr.done());
    CHECK(results[0].id() == 3);
    CHECK(results[1].id() == 4);
    CHECK(results[2].id() == 2);
    //! [First Iteration]

    //! [Next Iteration]
    // This will yield the next batch of neighbors.
    itr.next(batchsize);
    CHECK(itr.size() == 3);

    // The contents of the iterator are for batch 2.
    CHECK(itr.batch_number() == 2);

    // Update and inspect the results.
    results = itr.results();
    fmt::print("Neighbor 3 = {}\n", results[0].id());
    fmt::print("Neighbor 4 = {}\n", results[1].id());
    fmt::print("Neighbor 5 = {}\n", results[2].id());
    CHECK(!itr.done());
    CHECK(results[0].id() == 5);
    CHECK(results[1].id() == 1);
    CHECK(results[2].id() == 6);
    //! [Next Iteration]

    //! [Final Iteration]
    // So far, the iterator has yielded 6 of the 7 vectors in the dataset.
    // This call to `next()` should only yield a single neighbor - the last on in the index.
    itr.next(batchsize);
    CHECK(itr.size() == 1);
    CHECK(itr.done());
    results = itr.results();
    fmt::print("Neighbor 6 = {}\n", results[0].id());
    CHECK(results[0].id() == 0);
    //! [Final Iteration]

    //! [Beyond Final Iteration]
    // Calling `next()` again should yield no more candidates.
    itr.next(batchsize);
    CHECK(itr.size() == 0);
    CHECK(itr.done());
    //! [Beyond Final Iteration]
}

// Alternative main definition
int svs_main(std::vector<std::string> SVS_UNUSED(args)) {
    demonstrate_iterator();
    return 0;
}

SVS_DEFINE_MAIN();