Dynamic Vamana Index

Main Dynamic Vamana Class

template<graphs::MemoryGraph Graph, typename Data, typename Dist>
class MutableVamanaIndex

Public Types

using search_parameters_type = VamanaSearchParameters

The type of the configurable search parameters.

Public Functions

template<typename ExternalIds, typename ThreadPoolProto>
inline MutableVamanaIndex(const VamanaBuildParameters &parameters, Data data, const ExternalIds &external_ids, Dist distance_function, ThreadPoolProto threadpool_proto, svs::logging::logger_ptr logger = svs::logging::get())

Build a graph from scratch.

template<threads::ThreadPool Pool>
inline MutableVamanaIndex(const VamanaIndexParameters &config, data_type data, graph_type graph, const Dist &distance_function, IDTranslator translator, Pool threadpool, svs::logging::logger_ptr logger = svs::logging::get())

Post re-load constructor.

Preconditions

  • data.size() == graph.n_nodes(): The graph and the data have the same number of entries.

  • The data and graph were saved with no “holes”. In otherwords, the index was consolidated and compacted prior to saving.

  • The span of internal ID’s in translator covers exactly [0, data.size()).

inline svs::logging::logger_ptr get_logger() const

Getter method for logger.

Accessors

inline float get_alpha() const

Get the alpha value used for pruning while mutating the graph.

inline void set_alpha(float alpha)

Set the alpha value used for pruning while mutating the graph.

inline size_t get_graph_max_degree() const

Get the graph_max_degree used while mutating the graph.

inline size_t get_max_candidates() const

Get the max candidate pool size used while mutating the graph.

inline void set_max_candidates(size_t max_candidates)

Set the max candidate pool size to be used while mutating the graph.

inline size_t get_prune_to() const

Get the prune_to value used while mutating the graph.

inline void set_prune_to(size_t prune_to)

Set the prune_to value to be used while mutating the graph.

inline size_t get_construction_window_size() const

Get the window size used while mutating the graph.

inline void set_construction_window_size(size_t window_size)

Set the window size to be used while mutating the graph.

inline bool get_full_search_history() const

Return whether the full search history is being used while mutating the graph.

inline void set_full_search_history(bool enable)

Enable using the full search history for candidate generation while mutating the graph.

inline Idx translate_external_id(size_t e) const

Get the internal ID mapped to be e.

Requires that mapping for e exists. Otherwise, all bets are off.

See also

has_id, translate_internal_id

Parameters:

e – The external ID to translate to an internal ID.

inline bool has_id(size_t e) const

Check whether the external ID e exists in the index.

inline size_t translate_internal_id(Idx i) const

Get the external ID mapped to be i.

Requires that mapping for i exists. Otherwise, all bets are off.

Parameters:

i – The internal ID to translate to an external ID.

template<typename F>
inline void on_ids(F &&f) const

Call the functor with all external IDs in the index.

Parameters:

f – A functor with an overloaded operator()(size_t) method. Called on each external ID in the index.

inline std::vector<size_t> external_ids() const

Return a vector of all valid external IDs present in the index.

inline size_t size() const

Return the number of valid (non-deleted) entries in the index.

template<class Dims, class Base>
inline void translate_to_external(DenseArray<size_t, Dims, Base> &ids)

Translate in-place a collection of internal IDs to external IDs.

Modifies each entry in ids in place, assumes that entry is an internal ID and remaps it to its external ID.

This is used as a post-processing step following search to return the correct external neighbors to the caller, allowing inner search routines to simply return local IDs.

Several implementation notes: (1) This is definitely not safe to call multiple times on the same array for obvious reasons.

(2) All entries in ids should have valid translations. Otherwise, this function’s behavior is undefined.

Parameters:

ids – The DenseArray of internal IDs to modify.

inline auto get_datum(size_t e) const

Get the raw data for external id e.

inline size_t dimensions() const

Return the dimensionality of the stored dataset.

TODO (MH): This somewhat limits us to using only R^n type datasets. I’d like to see this generalized somewhat.

inline Dist distance_function() const

Return a unique instance of the distance function.

template<typename QueryType, typename I>
inline void exhaustive_search(const data::ConstSimpleDataView<QueryType> &queries, size_t num_neighbors, QueryResultView<I> result)

Perform an exhaustive search on the current state of the index. Useful to understand how well the graph search is doing after index mutation.

inline constexpr std::string_view name() const

Descriptive Name

template<std::integral I>
inline void clear_lists(const std::vector<I> &local_ids)

Clear the adjacency lists for the given local ids.

This ensures that during the rebuild-phase, we don’t get any zombie (previously deleted nodes) occurring in the new adjacency lists.

template<data::ImmutableMemoryDataset Points, class ExternalIds>
inline std::vector<size_t> add_points(const Points &points, const ExternalIds &external_ids, bool reuse_empty = false)

Add the points with the given external IDs to the dataset.

When delete_entries is called, a soft deletion is performed, marking the entries as deleted. When consolidate is called, the state of these deleted entries becomes empty. When add_points is called with the reuse_empty flag enabled, the memory is scanned from the beginning to locate and fill these empty entries with new points.

Parameters:
  • points – Dataset of points to add.

  • external_ids – The external IDs of the corresponding points. Must be a container implementing forward iteration.

  • reuse_empty – A flag that determines whether to reuse empty entries that may exist after deletion and consolidation. When enabled, scan from the beginning to find and fill these empty entries when adding new points.

template<typename T>
inline size_t delete_entries(const T &ids)

Delete all IDs stored in the random-access container ids.

Pre-conditions:

  • All indices present in ids belong to valid slots.

Post-conditions:

  • Deleted slots will not be returned in future calls search.

Implementation Nodes:

  • The deletion that happens is a “soft” deletion. This means that the corresponding entries are still present in both the dataset and the graph, and will be navigated through during searched.

    However, entries marked as deleted will not be returned from searches.

  • Delete consolidation should happen once a large enough percentage of slots have been soft deleted.

    Delete consolidation performs the actual removal of deleted entries from the graph.

inline std::vector<Idx> nonmissing_indices() const

Return all the non-missing internal IDs.

This includes both valid and soft-deleted entries.

inline void compact (Idx batch_size=1 '000)

Compact the data and the graph.

Parameters:

batch_size – Granularity at which points are shuffled. Setting this higher can improve performance but requires more working memory.

inline size_t get_num_threads() const

Return the current number of threads used for search.

See also

set_num_threads

template<threads::ThreadPool Pool>
inline void set_threadpool(Pool threadpool)

Destroy the original thread pool and set to the provided one.

ThreadPool

An acceptable thread pool should implement two methods:

  • size_t size(). This method should return the number of threads used in the thread pool.

  • void parallel_for(std::function<void(size_t)> f, size_t n). This method should execute f. Here, f(i) represents a task on the i^th partition, and n represents the number of partitions that need to be executed.

Parameters:

threadpool – An acceptable thread pool.

inline threads::ThreadPoolHandle &get_threadpool_handle()

Return the current thread pool handle.

inline void reset_performance_parameters()

Reset performance parameters to their default values for this index.

Parameters affected are only those that modify throughput on a given architecture. Accuracy results should not change as a side-effect of calling this function.

template<std::unsigned_integral I, svs::Arithmetic T>
inline void reconstruct_at(data::SimpleDataView<T> dst, std::span<const I> ids)

Reconstruct vectors.

Reconstruct each vector indexed by an external ID and store the results into dst.

Preconditions:

  • ids.size() == svs::getsize<0>(dst): Each ID has a corresponding entry in the destination array.

  • All indices are valid external IDs for this index.

  • svs::getsize<1>(dst) == dimensions(): The space allocated for each vector in dst is correct.

An exception will be thrown if any of these pre-conditions does not hold. If such an exception is thrown, the argument dst will be left unmodified.

template<typename F>
inline void experimental_escape_hatch(F &&f) const

Invoke the provided callable with constant references to the contained graph, data, and entry points.

This function is meant to provide a means for implementing experimental algorithms on the contained data structures.

inline void debug_check_invariants(bool allow_deleted) const

Verify the invariants of this data structure.

Parameters:

allow_deleted – Enable or disable deleted entries.

inline void debug_check_size() const

Make sure that the capacities of the main data structures (graph, data, metadata) agree.

inline void debug_check_graph_consistency(bool allow_deleted = false) const

Ensure the graph is in a consistent state.

In this case, consistency means the that the adjacency lists for all non-deleted vertices contain only non-deleted vertices.

This operation should be run after debug_check_size() to ensure that the sizes of the underlying data structures are consistent.

Parameters:

allow_deleted – Flag to indicate if nodes marked as Deleted are okay for consideration. Following a consolidation, this should be false. Otherwise, this should be true.

template<typename ExternalId, typename Query>
inline double get_distance(const ExternalId &external_id, const Query &query) const

Compute the distance between an external vector and a vector in the index.

Dynamic Vamana Entry Point

template<typename GraphLoader, typename DataLoader, typename Distance, typename ThreadPoolProto>
auto svs::index::vamana::auto_dynamic_assemble(const std::filesystem::path &config_path, GraphLoader &&graph_loader, DataLoader &&data_loader, Distance distance, ThreadPoolProto threadpool_proto, bool debug_load_from_static = false, svs::logging::logger_ptr logger = svs::logging::get())