Distance Computation

Functor are used to implement and tailor distance computations in a number of ways. The library uses functors (i.e., structs) for a number of reasons summarised below. In general, the function svs::distance::compute() is used to compute the similarity between queries on the left and dataset elements on the right.

  • Often, distance computation needs to perform some pre-processing per query inorder to execute efficiently. Using structs allows for this behavior.

  • If structs are empty, than the extra overhead introduced (e.g., space overhead) by them is negligible compared to function references or pointers.

  • Overloading operations for structs is easier than function references.

  • Using functors allows for fancier options like type-erasure to reduce generated code if desired.

In particular, bullet point (1) deserves a little explanation. An efficient implementation for similarity metric such as cosine-similarity may involve either computing the norm of the query (externally supplied vector) or normalizing the vector. Both cases require some local scratch space for either storing the computed norm or the normalized query (to maintain const-correctness of the externally supplied query). Using a distance functor allows this scratch space to exist. A single struct can then be re-used to process multiple queries in series or copied to process queries in parallel.

On the other hand, not all distance implementations (such as L-p distances or inner-product) may not require such state. This leads to the idea of argument fixing.

This is a test compute() of cross referencing.

Argument Fixing

To support functors both requiring and not needing query preprocessing, the function svs::distance::maybe_fix_argument() should be called on a query a before any distance computations requiring a as the left-hand argument. Note that the right-hand argument b can vary at will.

A distance function MyFunctor may opt-in to argument fixing by defining the method fix_argument as follows:

struct MyStruct {
    // Boiler-plate
    template<typename QueryType>
    void fix_argument(const QueryType& query) { /*impl*/ }
};

The possible values of QueryType should be restricted as appropriate.

API Documentation

template<typename T>
using compare_t = typename T::compare

Obtain the type of the comparator used for the distance functor T.

template<typename F, typename A>
void maybe_fix_argument(F &f, A a)

Perform an argument fixing of query a for the distance functor f. This provides a hook mechanism allowing distance functors to preprocess the query before performing any distance computations if desired.

This should be called before calling svs::distance::compute where b is allowed to vary but a is not.

Modifications to a require a new call to distance::maybe_fix_argument.

If distance functors wish to opt-in to argument fixing, the must define a member function fix_argument(A) which will be automatically called by distance::maybe_fix_argument.

If distance functors do not define such a member function, then distance::maybe_fix_argument becomes a no-op unless f defines the static member variable

static constexpr bool must_fix_argument = true;
in which case, missing overloads to f.fix_argument(A) become compiler errors.

template<typename A, typename B, HasOneArgCompute<B> F>
float compute(F &f, A, B b)

Perform a distance computation with the distance functor f, query a and element b.

Functors f can opt-in to this method in one of three ways:

  • Direct overloading of distance::compute. This can be used to extend the behavior of existing functors without modification.

  • If f does not implement argument fixing for this combination of types, then f may implement a member function f.compute(a, b) which will be called automatically.

  • if f does implement argument fixing for this type combination, then f may implement a single argument member function f.compute(b) which will be called.

Argument fixing is implied if f.fix_argument(a) is valid. In order to apply to all distance functors, distance::maybe_fix_argument should be called each time the value of a changes.

template<typename T>
compare_t<T> comparator(const T &x)

Return a comparison functgor for distance functors T. For example, distance functors that wish to minimize distances may return std::less<> while those those wishing to maximize may return std::greater<>.

Custom types can provide support for distance::comparator in one of three ways:

  • Direct overloading of distance::comparator.

  • If the desired comparison functor is default constructible, providing a type alias T::compare.

  • Providing a member function x.compare() returning the functor.

When T defines both a member function and a type alias, the member function takes precedence.

Parameters:

x – The distance functor to obtain the comparison functor for.

Returns:

A comparison functor f with an overloaded operator() such that

f(float, float) -> bool
is valid.

template<typename T>
constexpr bool implicitly_broadcastable()

Return whether distance functors of type T are implicitly broadcastable.

To be implicitly broadcastable, must not require argument fixing. In other words, it must be safe to call distance::compute with varying left and right-hand arguments without needing to call maybe_fix_argument.

To opt into implicit broadcasting, classes should implement a constexpr static boolean member implicit_broadcast = true.

template<typename T>
class BroadcastDistance
#include <svs/concepts/distance.h>

Efficiently create copies of the distance functor f of type T to allow batch distance computations involving multiple queries.

Be default, this is done by invoking the copy constructor the necessary number of times. However, if explicit copies of the functor f are not needed for correctness (i.e., distance::implicitly_broadcastable<T>() == true, than these copies will not be made resulting in a slightly more efficient implementation.

Invariants:

  • size() cannot be zero. This is because distance functions requiring broadcasting are often stateful and may have some shared (usually read-only) state.

    If the underlying storage for this class ever becomes empty, we lose that storage.

Public Functions

inline BroadcastDistance(T distance, size_t ncopies)

Construct ncopies of the distance functor distance.

inline T &operator[](size_t i)

Retrieve the ith distance functor.

inline size_t size() const

Return the number of distance functors.

inline void resize(size_t new_size)

Resize the number of stored distance functors.