Dynamic Vamana Index

Main Dynamic Vamana Class

template<graphs::MemoryGraph Graph, typename Data, typename Dist>
class MutableVamanaIndex

Public Types

using search_parameters_type = VamanaSearchParameters

The type of the configurable search parameters.

Public Functions

template<typename ExternalIds>
inline MutableVamanaIndex(const VamanaBuildParameters &parameters, Data data, const ExternalIds &external_ids, Dist distance_function, size_t num_threads)

Build a graph from scratch.

inline MutableVamanaIndex(const VamanaIndexParameters &config, data_type data, graph_type graph, const Dist &distance_function, IDTranslator translator, threads::NativeThreadPool threadpool)

Post re-load constructor.

Preconditions

  • data.size() == graph.n_nodes(): The graph and the data have the same number of entries.

  • The data and graph were saved with no “holes”. In otherwords, the index was consolidated and compacted prior to saving.

  • The span of internal ID’s in translator covers exactly [0, data.size()).

inline float get_alpha() const

Get the alpha value used for pruning while mutating the graph.

See also

set_alpha, get_construction_window_size, set_construction_window_size

inline void set_alpha(float alpha)

Set the alpha value used for pruning while mutating the graph.

See also

get_alpha, get_construction_window_size, set_construction_window_size

inline size_t get_construction_window_size() const

Get the window size used the mutating the graph.

See also

set_construction_window_size, get_alpha, set_alpha

inline void set_construction_window_size(size_t window_size)

Set the window size used the mutating the graph.

See also

get_construction_window_size, get_alpha, set_alpha

inline Idx translate_external_id(size_t e) const

Get the internal ID mapped to be e.

Requires that mapping for e exists. Otherwise, all bets are off.

See also

has_id, translate_internal_id

Parameters:

e – The external ID to translate to an internal ID.

inline bool has_id(size_t e) const

Check whether the external ID e exists in the index.

inline size_t translate_internal_id(Idx i) const

Get the external ID mapped to be i.

Requires that mapping for i exists. Otherwise, all bets are off.

Parameters:

i – The internal ID to translate to an external ID.

template<typename F>
inline void on_ids(F &&f) const

Call the functor with all external IDs in the index.

Parameters:

f – A functor with an overloaded operator()(size_t) method. Called on each external ID in the index.

inline std::vector<size_t> external_ids() const

Return a vector of all valid external IDs present in the index.

inline size_t size() const

Return the number of valid (non-deleted) entries in the index.

template<class Dims, class Base>
inline void translate_to_external(DenseArray<size_t, Dims, Base> &ids)

Translate in-place a collection of internal IDs to external IDs.

Modifies each entry in ids in place, assumes that entry is an internal ID and remaps it to its external ID.

This is used as a post-processing step following search to return the correct external neighbors to the caller, allowing inner search routines to simply return local IDs.

Several implementation notes: (1) This is definitely not safe to call multiple times on the same array for obvious reasons.

(2) All entries in ids should have valid translations. Otherwise, this function’s behavior is undefined.

Parameters:

ids – The DenseArray of internal IDs to modify.

inline auto get_datum(size_t e) const

Get the raw data for external id e.

inline size_t dimensions() const

Return the dimensionality of the stored dataset.

TODO (MH): This somewhat limits us to using only R^n type datasets. I’d like to see this generalized somewhat.

inline Dist distance_function() const

Return a unique instance of the distance function.

template<typename QueryType, typename I>
inline void exhaustive_search(const data::ConstSimpleDataView<QueryType> &queries, size_t num_neighbors, QueryResultView<I> result)

Perform an exhaustive search on the current state of the index. Useful to understand how well the graph search is doing after index mutation.

inline constexpr std::string_view name() const

Descriptive Name

template<std::integral I>
inline void clear_lists(const std::vector<I> &local_ids)

Clear the adjacency lists for the given local ids.

This ensures that during the rebuild-phase, we don’t get any zombie (previously deleted nodes) occuring in the new adjacency lists.

template<data::ImmutableMemoryDataset Points, class ExternalIds>
inline std::vector<size_t> add_points(const Points &points, const ExternalIds &external_ids)

Add the points with the given external IDs to the dataset.

Parameters:
  • points – Dataset of points to add.

  • external_ids – The external IDs of the corresponding points. Must be a container implementing forward iteration.

template<typename T>
inline void delete_entries(const T &ids)

Delete all IDs stored in the random-access container ids.

Pre-conditions:

  • All indices present in ids belong to valid slots.

Post-conditions:

  • Deleted slots will not be returned in future calls search.

Implementation Nodes:

  • The deletion that happens is a “soft” deletion. This means that the corresponding entries are still present in both the dataset and the graph, and will be navigated through during searched.

    However, entries marked as deleted will not be returned from searches.

  • Delete consolidation should happen once a large enough percentage of slots have been soft deleted.

    Delete consolidation performs the actual removal of deleted entries from the graph.

inline std::vector<Idx> nonmissing_indices() const

Return all the non-missing internal IDs.

This includes both valid and soft-deleted entries.

inline void compact (Idx batch_size=1 '000)

Compact the data and the graph.

Parameters:

batch_size – Granularity at which points are shuffled. Setting this higher can improve performance but requires more working memory.

inline void reset_performance_parameters()

Reset performance parameters to their default values for this index.

Parameters affected are only those that modify throughput on a given architecture. Accuracy results should not change as a side-effect of calling this function.

template<std::unsigned_integral I, svs::Arithmetic T>
inline void reconstruct_at(data::SimpleDataView<T> dst, std::span<const I> ids)

Reconstruct vectors.

Reconstruct each vector indexed by an external ID and store the results into dst.

Preconditions:

  • ids.size() == svs::getsize<0>(dst): Each ID has a corresponding entry in the destination array.

  • All indices are valid external IDs for this index.

  • svs::getsize<1>(dst) == dimensions(): The space allocated for each vector in dst is correct.

An exception will be thrown if any of these pre-conditions does not hold. If such an exception is thrown, the argument dst will be left unmodified.

inline void debug_check_invariants(bool allow_deleted) const

Verify the invariants of this data structure.

Parameters:

allow_deleted – Enable or disable deleted entries.

inline void debug_check_size() const

Make sure that the capacities of the main data structures (graph, data, metadata) agree.

inline void debug_check_graph_consistency(bool allow_deleted = false) const

Ensure the graph is in a consistent state.

In this case, consistency means the that the adjacency lists for all non-deleted vertices contain only non-deleted vertices.

This operation should be run after debug_check_size() to ensure that the sizes of the underlying data structures are consistent.

Parameters:

allow_deleted – Flag to indicate if nodes marked as Deleted are okay for consideration. Following a consolidation, this should be false. Otherwise, this should be true.

Dynamic Vamana Entry Point

template<lib::LazyInvocable<> GraphLoader, typename DataLoader, typename Distance>
auto svs::index::vamana::auto_dynamic_assemble(const std::filesystem::path &config_path, const GraphLoader &graph_loader, const DataLoader &data_loader, Distance distance, size_t num_threads, bool debug_load_from_static = false)