Flat Index
Main Flat Class
-
template<data::ImmutableMemoryDataset Data, typename Dist, typename Ownership = OwnsMembers>
class FlatIndex Implementation of the Flat index.
The mid-level implementation for the flat index that uses exhaustive search to find the exact nearest neighbors (within the limitations of possibly quantization error for the dataset or floating-point error for some distance functors).
NOTE: This method is not as performant as other index methods. It is meant to return the exact rather than approximate nearest neighbors and thus must exhaustively search the whole dataset.
- Template Parameters:
Data – The full type of the dataset being indexed.
Dist – The distance functor used to compare queries with the elements of the dataset.
Ownership – Implementation detail and may be ommitted for most use cases.
Public Types
Public Functions
-
template<typename ThreadPoolProto>
inline FlatIndex(Data data, Dist distance, ThreadPoolProto &&threadpool_proto)
requires std Construct a new index from constituent parts.
- Template Parameters:
ThreadPoolProto – The type of the threadpool proto type. See notes on the corresponding parameter below.
- Parameters:
data – The data to use for the index. The resulting index will take ownership of the passed argument.
distance – The distance functor to use to compare queries with dataset elements.
threadpool_proto – Something that can be used to build a threadpool using
threads::as_threadpool
. In practice, this means thatthreapool_proto
can be either a threadpool directly, or an integer. In the latter case, a new threadpool will be constructed usingthreadpool_proto
as the number of threads to create.
-
inline size_t size() const
Return the number of independent entries in the index.
-
inline size_t dimensions() const
Return the logical number of dimensions of the indexed vectors.
-
template<typename QueryType, typename Pred = lib::Returns<lib::Const<true>>>
inline void search(QueryResultView<size_t> result, const data::ConstSimpleDataView<QueryType> &queries, const search_parameters_type &search_parameters, Pred predicate = lib::Returns(lib::Const<true>())) Fill the result with the
num_neighbors
nearest neighbors for each query.Preconditions:
The following pre-conditions must hold. Otherwise, the behavior is undefined.
result.n_queries() == queries.size()
result.n_neighbors() == num_neighbors
.The value type of
queries
is compatible with the value type of the index dataset with respect to the stored distance functor.
Implementation Details
The internal call stack looks something like this.
{}search: Prepare scratch space and perform tiling over the dataset. | +-> search_subset: multi-threaded search of all queries over the current subset of the dataset. Partitions up the queries according to query batch size and dynamically load balances query partition among worker threads. | +-> search_patch: Bottom level routine meant to run on a single thread. Compute the distances between a subset of the queries and a subset of the data and maintines the `num_neighbors` best results seen so far.
- Template Parameters:
Queries – The full type of the queries.
Pred – The type of the optional predicate.
- Parameters:
queries – The queries. Each entry will be processed.
num_neighbors – The number of approximate nearest neighbors to return for each query.
result – The result data structure to populate. Row
i
in the result corresponds to the neighbors for thei
th query. Neighbors within each row are ordered from nearest to furthest.predicate – A predicate functor that can be used to exclude certain dataset elements from consideration. This functor must implement
bool operator()(size_t)
where thesize_t
argument is an index in[0, data.size())
. If the predicate returnstrue
, that dataset element will be considered.
-
inline size_t get_num_threads() const
Return the current number of threads used for search.
See also
-
inline void set_num_threads(size_t num_threads)
Set the number of threads used for search.
Implementation note: The number of threads cannot be zero. If zero is passed to this method, it will be silently changed to 1.
See also
- Parameters:
num_threads – The new number of threads to use.
Public Static Functions
-
static inline bool can_change_threads()
Return whether this implementation can dynamically change the number of threads.
Flat Entry Point
-
template<typename DataProto, typename Distance, typename ThreadPoolProto>
auto svs::index::flat::auto_assemble(DataProto &&data_proto, Distance distance, ThreadPoolProto threadpool_proto) Entry point for loading a Flat index.
This method provides much of the heavy lifting for constructing a Flat index from a data file on disk or a dataset in memory.
- Parameters:
data_proto – Data prototype. See expanded notes.
distance – The distance functor to use to compare queries with elements of the dataset.
threadpool_proto – Precursor for the thread pool to use. Can either be a threadpool instance of an integer specifying the number of threads to use.