Flat Index

Main Flat Class

template<data::ImmutableMemoryDataset Data, typename Dist, typename Ownership = OwnsMembers>
class FlatIndex

Implementation of the Flat index.

The mid-level implementation for the flat index that uses exhaustive search to find the exact nearest neighbors (within the limitations of possibly quantization error for the dataset or floating-point error for some distance functors).

NOTE: This method is not as performant as other index methods. It is meant to return the exact rather than approximate nearest neighbors and thus must exhaustively search the whole dataset.

Template Parameters:
  • Data – The full type of the dataset being indexed.

  • Dist – The distance functor used to compare queries with the elements of the dataset.

  • Ownership – Implementation detail and may be ommitted for most use cases.

Public Types

using distance_type = Dist

The type of the distance functor.

using data_type = Data

The type of dataset.

Public Functions

template<typename ThreadPoolProto>
inline FlatIndex(Data data, Dist distance, ThreadPoolProto &&threadpool_proto)

Construct a new index from constituent parts.

Template Parameters:

ThreadPoolProto – The type of the threadpool proto type. See notes on the corresponding parameter below.

Parameters:
  • data – The data to use for the index. The resulting index will take ownership of the passed argument.

  • distance – The distance functor to use to compare queries with dataset elements.

  • threadpool_proto – Something that can be used to build a threadpool using threads::as_threadpool. In practice, this means that threapool_proto can be either a threadpool directly, or an integer. In the latter case, a new threadpool will be constructed using threadpool_proto as the number of threads to create.

inline size_t size() const

Return the number of independent entries in the index.

inline size_t dimensions() const

Return the logical number of dimensions of the indexed vectors.

template<typename QueryType, typename Pred = lib::Returns<lib::Const<true>>>
inline void search(QueryResultView<size_t> result, const data::ConstSimpleDataView<QueryType> &queries, const search_parameters_type &search_parameters, const lib::DefaultPredicate &cancel = lib::Returns(lib::Const<false>()), Pred predicate = lib::Returns(lib::Const<true>()))

Fill the result with the num_neighbors nearest neighbors for each query.

Preconditions:

The following pre-conditions must hold. Otherwise, the behavior is undefined.

  • result.n_queries() == queries.size()

  • result.n_neighbors() == num_neighbors.

  • The value type of queries is compatible with the value type of the index dataset with respect to the stored distance functor.

Implementation Details

The internal call stack looks something like this.

search: Prepare scratch space and perform tiling over the dataset.
  |
  +-> search_subset: multi-threaded search of all queries over the current subset
      of the dataset. Partitions up the queries according to query batch size
      and dynamically load balances query partition among worker threads.
        |
        +-> search_patch: Bottom level routine meant to run on a single thread.
            Compute the distances between a subset of the queries and a subset
            of the data and maintines the `num_neighbors` best results seen so far.
{}

Template Parameters:
  • Queries – The full type of the queries.

  • Pred – The type of the optional predicate.

Parameters:
  • result – The result data structure to populate. Row i in the result corresponds to the neighbors for the ith query. Neighbors within each row are ordered from nearest to furthest. num_neighbors is computed from the number of columns in result.

  • queries – A dense collection of queries in R^n.

  • search_parameters – search parameters to use for the search.

  • cancel – A predicate called during the search to determine if the search should be cancelled.

  • predicate – A predicate functor that can be used to exclude certain dataset elements from consideration. This functor must implement bool operator()(size_t) where the size_t argument is an index in [0, data.size()). If the predicate returns true, that dataset element will be considered.

inline size_t get_num_threads() const

Return the current number of threads used for search.

See also

set_num_threads

inline void set_num_threads(size_t num_threads)

Set the number of threads used for search.

Implementation note: The number of threads cannot be zero. If zero is passed to this method, it will be silently changed to 1.

See also

get_num_threads

Parameters:

num_threads – The new number of threads to use.

Public Static Functions

static inline bool can_change_threads()

Return whether this implementation can dynamically change the number of threads.

Flat Entry Point

template<typename DataProto, typename Distance, typename ThreadPoolProto>
auto svs::index::flat::auto_assemble(DataProto &&data_proto, Distance distance, ThreadPoolProto threadpool_proto)

Entry point for loading a Flat index.

This method provides much of the heavy lifting for constructing a Flat index from a data file on disk or a dataset in memory.

data_loader

The data loader should be any object loadable via svs::detail::dispatch_load returning a Vamana compatible dataset. Concrete examples include:

Parameters:
  • data_proto – Data prototype. See expanded notes.

  • distance – The distance functor to use to compare queries with elements of the dataset.

  • threadpool_proto – Precursor for the thread pool to use. Can either be a threadpool instance of an integer specifying the number of threads to use.