Dynamic Vamana Index
Main Dynamic Vamana Class
-
template<graphs::MemoryGraph Graph, typename Data, typename Dist>
class MutableVamanaIndex Public Types
-
using search_parameters_type = VamanaSearchParameters
The type of the configurable search parameters.
Public Functions
-
template<typename ExternalIds>
inline MutableVamanaIndex(const VamanaBuildParameters ¶meters, Data data, const ExternalIds &external_ids, Dist distance_function, size_t num_threads) Build a graph from scratch.
-
inline MutableVamanaIndex(const VamanaIndexParameters &config, data_type data, graph_type graph, const Dist &distance_function, IDTranslator translator, threads::NativeThreadPool threadpool)
Post re-load constructor.
Preconditions
data.size() == graph.n_nodes(): The graph and the data have the same number of entries.
The data and graph were saved with no “holes”. In otherwords, the index was consolidated and compacted prior to saving.
The span of internal ID’s in translator covers exactly
[0, data.size())
.
-
inline float get_alpha() const
Get the alpha value used for pruning while mutating the graph.
See also
set_alpha, get_construction_window_size, set_construction_window_size
-
inline void set_alpha(float alpha)
Set the alpha value used for pruning while mutating the graph.
See also
get_alpha, get_construction_window_size, set_construction_window_size
-
inline size_t get_construction_window_size() const
Get the window size used the mutating the graph.
See also
set_construction_window_size, get_alpha, set_alpha
-
inline void set_construction_window_size(size_t window_size)
Set the window size used the mutating the graph.
See also
get_construction_window_size, get_alpha, set_alpha
-
inline Idx translate_external_id(size_t e) const
Get the internal ID mapped to be
e
.Requires that mapping for
e
exists. Otherwise, all bets are off.See also
has_id, translate_internal_id
- Parameters:
e – The external ID to translate to an internal ID.
-
inline bool has_id(size_t e) const
Check whether the external ID
e
exists in the index.
-
inline size_t translate_internal_id(Idx i) const
Get the external ID mapped to be
i
.Requires that mapping for
i
exists. Otherwise, all bets are off.- Parameters:
i – The internal ID to translate to an external ID.
-
template<typename F>
inline void on_ids(F &&f) const Call the functor with all external IDs in the index.
- Parameters:
f – A functor with an overloaded
operator()(size_t)
method. Called on each external ID in the index.
-
inline std::vector<size_t> external_ids() const
Return a vector of all valid external IDs present in the index.
-
inline size_t size() const
Return the number of valid (non-deleted) entries in the index.
- template<class Dims, class Base> requires (std::tuple_size_v< Dims >==2) void translate_to_external(DenseArray< size_t
Translate in-place a collection of internal IDs to external IDs.
Modifies each entry in
ids
in place, assumes that entry is an internal ID and remaps it to its external ID.This is used as a post-processing step following search to return the correct external neighbors to the caller, allowing inner search routines to simply return local IDs.
Several implementation notes: (1) This is definitely not safe to call multiple times on the same array for obvious reasons.
(2) All entries in
ids
should have valid translations. Otherwise, this function’s behavior is undefined.- Parameters:
ids – The
DenseArray
of internal IDs to modify.
-
inline auto get_datum(size_t e) const
Get the raw data for external id
e
.
-
inline size_t dimensions() const
Return the dimensionality of the stored dataset.
TODO (MH): This somewhat limits us to using only R^n type datasets. I’d like to see this generalized somewhat.
-
template<typename QueryType, typename I>
inline void exhaustive_search(const data::ConstSimpleDataView<QueryType> &queries, size_t num_neighbors, QueryResultView<I> result) Perform an exhaustive search on the current state of the index. Useful to understand how well the graph search is doing after index mutation.
-
inline constexpr std::string_view name() const
Descriptive Name
-
template<std::integral I>
inline void clear_lists(const std::vector<I> &local_ids) Clear the adjacency lists for the given local ids.
This ensures that during the rebuild-phase, we don’t get any zombie (previously deleted nodes) occuring in the new adjacency lists.
-
template<data::ImmutableMemoryDataset Points, class ExternalIds>
inline std::vector<size_t> add_points(const Points &points, const ExternalIds &external_ids) Add the points with the given external IDs to the dataset.
- Parameters:
points – Dataset of points to add.
external_ids – The external IDs of the corresponding points. Must be a container implementing forward iteration.
-
template<typename T>
inline void delete_entries(const T &ids) Delete all IDs stored in the random-access container
ids
.Pre-conditions:
All indices present in
ids
belong to valid slots.
Post-conditions:
Deleted slots will not be returned in future calls
search
.
Implementation Nodes:
The deletion that happens is a “soft” deletion. This means that the corresponding entries are still present in both the dataset and the graph, and will be navigated through during searched.
However, entries marked as
deleted
will not be returned from searches.Delete consolidation should happen once a large enough percentage of slots have been soft deleted.
Delete consolidation performs the actual removal of deleted entries from the graph.
-
inline std::vector<Idx> nonmissing_indices() const
Return all the non-missing internal IDs.
This includes both valid and soft-deleted entries.
- inline void compact (Idx batch_size=1 '000)
Compact the data and the graph.
- Parameters:
batch_size – Granularity at which points are shuffled. Setting this higher can improve performance but requires more working memory.
-
inline void reset_performance_parameters()
Reset performance parameters to their default values for this index.
Parameters affected are only those that modify throughput on a given architecture. Accuracy results should not change as a side-effect of calling this function.
-
template<std::unsigned_integral I, svs::Arithmetic T>
inline void reconstruct_at(data::SimpleDataView<T> dst, std::span<const I> ids) Reconstruct vectors.
Reconstruct each vector indexed by an external ID and store the results into
dst
.Preconditions:
ids.size() == svs::getsize<0>(dst)
: Each ID has a corresponding entry in the destination array.All indices are valid external IDs for this index.
svs::getsize<1>(dst) == dimensions()
: The space allocated for each vector indst
is correct.
An exception will be thrown if any of these pre-conditions does not hold. If such an exception is thrown, the argument
dst
will be left unmodified.
-
inline void debug_check_invariants(bool allow_deleted) const
Verify the invariants of this data structure.
- Parameters:
allow_deleted – Enable or disable deleted entries.
-
inline void debug_check_size() const
Make sure that the capacities of the main data structures (graph, data, metadata) agree.
-
inline void debug_check_graph_consistency(bool allow_deleted = false) const
Ensure the graph is in a consistent state.
In this case, consistency means the that the adjacency lists for all non-deleted vertices contain only non-deleted vertices.
This operation should be run after
debug_check_size()
to ensure that the sizes of the underlying data structures are consistent.- Parameters:
allow_deleted – Flag to indicate if nodes marked as
Deleted
are okay for consideration. Following a consolidation, this should befalse
. Otherwise, this should betrue
.
-
using search_parameters_type = VamanaSearchParameters
Dynamic Vamana Entry Point
-
template<lib::LazyInvocable<> GraphLoader, typename DataLoader, typename Distance>
auto svs::index::vamana::auto_dynamic_assemble(const std::filesystem::path &config_path, const GraphLoader &graph_loader, const DataLoader &data_loader, Distance distance, size_t num_threads, bool debug_load_from_static = false)