Dynamic Vamana Graph Index
In this section, we cover the API and usage of the DynamicVamana graph-based index.
- class svs.DynamicVamana
Top level class for the dynamic Vamana graph index.
- __init__(self: svs::python.DynamicVamana, config_path: str, graph_loader: svs::python.GraphLoader, data_loader: Union[svs::python.VectorDataLoader, svs::python.LVQLoader], distance: svs::python.DistanceType = <DistanceType.L2: 0>, query_type: svs::python.DataType = <DataType.float32: 9>, enforce_dims: bool = False, num_threads: int = 1, debug_load_from_static: bool = False) None
- add(self: svs::python.DynamicVamana, points: numpy.ndarray[numpy.float32], ids: numpy.ndarray[numpy.uint64]) None
Add every point in
points
to the index, assigning the element-wise corresponding ID to each point.- Parameters:
points – A matrix of data whose rows, corresponding to points in R^n, will be added to the index.
ids – Vector of ids to assign to each row in
points
. Must have the same number of elements aspoints
has rows.
Furthermore, all entries in
ids
must be unique and not already exist in the index. If either of these does not hold, an exception will be thrown without mutating the underlying index.
- all_ids(self: svs::python.DynamicVamana) numpy.ndarray[numpy.uint64]
Return a Numpy vector of all IDs currently in the index.
- property alpha
Get/set the alpha value used when adding and deleting points.
- Type:
Read/Write (float)
- static build(parameters: svs::python.VamanaBuildParameters, data: numpy.ndarray[numpy.float32], ids: numpy.ndarray[numpy.uint64], distance_type: svs::python.DistanceType, num_threads: int) svs::python.DynamicVamana
Construct a Vamana index over the given data, returning a searchable index.
- Parameters:
data – The dataset to index. NOTE: SVS will maintain an internal copy of the dataset. This may change in future releases.
parameters – Parameters controlling graph construction. See below for the documentation of this class.
distance_type – The distance type to use for this dataset.
- compact(self: svs::python.DynamicVamana, arg0: int) svs::python.DynamicVamana
Remove any holes created in the graph and data by renumbering internal IDs. Shrink the underlying data structures. Following
consolidate
, this can potentialy reduce the memory footprint of the index if a sufficient number of points were deleted.
- consolidate(self: svs::python.DynamicVamana) svs::python.DynamicVamana
Remove and patch around all deleted entries in the graph. Should be called after a sufficient number of deletions to avoid the memory consumption of the index monotonically increasing.
- property construction_window_size
Get/set the window size used when adding and deleting points.
- Type:
Read/Write (int)
- delete(self: svs::python.DynamicVamana, ids: numpy.ndarray[numpy.uint64]) None
Soft delete the IDs from the index. Soft deletion does not remove the IDs from the graph, but prevents them from being returned from future searches.
- Parameters:
ids – The IDs to delete.
Each element in IDs must be unique and must correspond to a valid ID stored in the index. Otherwise, an exception will be thrown. If an exception is thrown for this reason, the index will be left unchanged from before the function call.
- property dimensions
Return the logical number of dimensions for each vector in the dataset.
- property experimental_backend_string
Get a string identifying the full-type of the backend implementation.
This property is experimental and subject to change without a deprecation warning.
- Type:
Read Only (str)
- experimental_calibrate(*args, **kwargs)
Overloaded function.
experimental_calibrate(self: svs::python.DynamicVamana, queries: numpy.ndarray[float16], groundtruth: numpy.ndarray[numpy.uint32], num_neighbors: int, target_recall: float, calibration_parameters: svs::python.VamanaCalibrationParameters = <svs::python.VamanaCalibrationParameters object at 0x7efceacc9270>) -> svs::python.VamanaSearchParameters
NOTE: This method is experimental and subject to change or removal without notice.
Run an experimental calibration routine to select the best search parameters.
- Parameters:
queries – Queries used to drive the calibration process.
groundtruth – The groundtruth for the given query set.
num_neighbors – The number of nearest neighbors to calibrate for.
target_recall – The target num_neighbors-recall-at-num_neighbors. If such a recall is possible, then calibration will find parameters that optimize performance at this recall level.
calibration_parameters – The hyper-parameters to use during calibration.
- Returns:
The best svs.VamanaSearchParameters found.
The calibration routine will also configure the index with the best found parameters. Note that calibration will use the number of threads already assigned to the index and can therefore be used to tune the algorithm to different threading amounts.
See also: svs.VamanaCalibrationParameters
experimental_calibrate(self: svs::python.DynamicVamana, queries: numpy.ndarray[numpy.float32], groundtruth: numpy.ndarray[numpy.uint32], num_neighbors: int, target_recall: float, calibration_parameters: svs::python.VamanaCalibrationParameters = <svs::python.VamanaCalibrationParameters object at 0x7efceacc93b0>) -> svs::python.VamanaSearchParameters
NOTE: This method is experimental and subject to change or removal without notice.
Run an experimental calibration routine to select the best search parameters.
- Parameters:
queries – Queries used to drive the calibration process.
groundtruth – The groundtruth for the given query set.
num_neighbors – The number of nearest neighbors to calibrate for.
target_recall – The target num_neighbors-recall-at-num_neighbors. If such a recall is possible, then calibration will find parameters that optimize performance at this recall level.
calibration_parameters – The hyper-parameters to use during calibration.
- Returns:
The best svs.VamanaSearchParameters found.
The calibration routine will also configure the index with the best found parameters. Note that calibration will use the number of threads already assigned to the index and can therefore be used to tune the algorithm to different threading amounts.
See also: svs.VamanaCalibrationParameters
- experimental_reset_performance_parameters(self: svs::python.DynamicVamana) None
Reset the internal performance-only parameters to built-in heuristics. This can be useful if experimenting with different dataset implementations which may need different values for performance-only parameters (such as prefetchers).
Calling this method should not affect recall.
- has_id(self: svs::python.DynamicVamana, id: int) bool
Return whether the ID exists in the index.
- property num_threads
Get and set the number of threads used to process queries.
- Type:
Read/Write (int)
- property query_types
Return the query element types this index is specialized for.
- reconstruct(self: svs::python.DynamicVamana, ids: numpy.ndarray[numpy.uint64]) numpy.ndarray[numpy.float32]
- save(self: svs::python.DynamicVamana, config_directory: str, graph_directory: str, data_directory: str) None
Save a constructed index to disk (useful following index construction).
- Parameters:
config_directory – Directory where index configuration information will be saved.
graph_directory – Directory where graph will be saved.
data_directory – Directory where the dataset will be saved.
Note: All directories should be separate to avoid accidental name collision with any auxiliary files that are needed when saving the various components of the index.
If the directory does not exist, it will be created if its parent exists.
It is the caller’s responsibilty to ensure that no existing data will be overwritten when saving the index to this directory.
- search(self: svs::python.DynamicVamana, queries: numpy.ndarray[numpy.float32], n_neighbors: int) tuple
Perform a search to return the n_neighbors approximate nearest neighbors to the query.
- Parameters:
queries – Numpy Vector or Matrix representing the queries. If the argument is a vector, it will be treated as a single query. If the argument is a matrix, individual queries are assumed to the rows of the matrix. Returned results will have a position-wise correspondence with the queries. That is, the N-th row of the returned IDs and distances will correspond to the N-th row in the query matrix.
n_neighbors – The number of neighbors to return for this search job.
- Returns:
A tuple (I, D) where I contains the n_neighbors approximate (or exact) nearest neighbors to the queries and D contains the approximate distances.
Note: This form is returned regardless of whether the given query was a vector or a matrix.
- property search_parameters
Get/set the current search parameters for the index. These parameters modify both the algorithmic properties of search (affecting recall) and non-algorthmic properties of search (affecting queries-per-second).
See also: svs.VamanaSearchParameters.
- Type:
“Read/Write (svs.VamanaSearchParameters)
- property search_window_size
Get/set the size of the internal search buffer. A larger value will likely yield more accurate results at the cost of speed.
- Type:
Read/Write (int)
- property size
Return the number of elements in the indexed dataset.
- property visited_set_enabled
Deprecated
Read/Write (bool): Get/set whether the visited set is used. Enabling the visited set can be helpful if the distance computations required are relatively expensive as it can reduce redundant computations.
In general, through, it’s probably faster to leave this disabled.