= sycl_ext_oneapi_virtual_functions

:source-highlighter: coderay
:coderay-linenums-mode: table

// This section needs to be after the document title.
:doctype: book
:toc2:
:toc: left
:encoding: utf-8
:lang: en
:dpcpp: pass:[DPC++]

// Set the default source code type in this document to C++,
// for syntax highlighting purposes.  This is needed because
// docbook uses c++ and html5 uses cpp.
:language: {basebackend@docbook:c++:cpp}


== Notice

[%hardbreaks]
Copyright (C) 2024-2024 Intel Corporation.  All rights reserved.

Khronos(R) is a registered trademark and SYCL(TM) and SPIR(TM) are trademarks
of The Khronos Group Inc.  OpenCL(TM) is a trademark of Apple Inc. used by
permission by Khronos.


== Contact

To report problems with this extension, please open a new issue at:

https://github.com/intel/llvm/issues


== Dependencies

This extension is written against the SYCL 2020 revision 8 specification.  All
references below to the "core SYCL specification" or to section numbers in the
SYCL specification refer to that revision.

This extension also depends on the following other SYCL extensions:

* link:../experimental/sycl_ext_oneapi_kernel_properties.asciidoc[
  sycl_ext_oneapi_kernel_properties]
* link:../experimental/sycl_ext_oneapi_properties.asciidoc[
  sycl_ext_oneapi_properties]
* link:../experimental/sycl_ext_oneapi_named_sub_group_sizes.asciidoc[
  sycl_ext_oneapi_named_sub_group_sizes]

== Status

This is a proposed extension specification, intended to gather community
feedback.  Interfaces defined in this specification may not be implemented yet
or may be in a preliminary state.  The specification itself may also change in
incompatible ways before it is finalized.  *Shipping software products should
not rely on APIs defined in this specification.*

== Backend support status

The APIs in this extension may be used only on a device that has
`aspect::ext_oneapi_virtual_functions`.  The application must check that the
device has this aspect before submitting a kernel using any of the APIs in this
extension.  If the application fails to do this, the implementation throws
a synchronous exception with the `errc::kernel_not_supported` error code
when the kernel is submitted to the queue.

== Overview

The main purpose of this extension is to reduce amount of SYCL language
restrictions for device code by allowing to call virtual member functions
from device functions.

NOTE: this extension **does not** cover (i.e. doesn't enable) things like
`dynamic_cast`, `typeid` or calls through function pointers.

== Specification

=== Feature test macro

This extension provides a feature-test macro as described in the core SYCL
specification.  An implementation supporting this extension must predefine the
macro `SYCL_EXT_ONEAPI_VIRTUAL_FUNCTIONS` to one of the values defined in the
table below.  Applications can test for the existence of this macro to determine
if the implementation supports this feature, or applications can test the
macro's value to determine which of the extension's features the implementation
supports.

[%header,cols="1,5"]
|===
|Value
|Description

|1
|The APIs of this experimental extension are not versioned, so the
 feature-test macro always has this value.
|===

=== New language restrictions for device functions

The following restriction, listed in section 5.4 of the core SYCL specification
does not apply to kernels submitted with the `assume_indirect_calls_to` and
`assume_indirect_calls` properties:

> The odr-use of polymorphic classes and classes with virtual inheritance is
> allowed. *However, no virtual member functions are allowed to be called in a
> device function.*

However, there are still some limitations of how virtual member functions can
be used:

- if an object is constructed in host code, calling a virtual member function
  for that object in device code has undefined behavior;
- if an object is constructed in device code on a device `A`, calling a virtual
  member function for that object in host code, or on another device `B` has
  undefined behavior;

=== New properties

Under the hood virtual functions are essentially function pointers which are
stored in a global variable and managed by compiler-generated code. Therefore,
each call to a virtual member function is an indirect call and compiler may not
be able to understand which exact virtual function is being called (i.e. which
class it belongs to).

Without any knowledge about which virtual function can be called from which
kernels compiler will have to make all virtual functions available to all
kernels. That may not be desirable because some of those virtual functions could
use features that are prohibited in device code.

In order to help compiler to build a mapping between kernels and virtual
functions they may call, the extension introduces new compile-time-constant
properties.

[source,dpcpp]
----
namespace sycl::ext::oneapi::experimental {

  struct indirectly_callable_key {
    template <typename SetId>
    using value_t = property_value<indirectly_callable_key, SetId>;
  };

  struct calls_indirectly_key {
    template <typename First, typename... SetIds>
    using value_t = property_value<calls_indirectly_key, First, SetIds...>;
  };

  inline constexpr indirectly_callable_key::value_t<void> indirectly_callable;

  template <typename SetId>
  inline constexpr indirectly_callable_key::value_t<SetId>
      indirectly_callable_in;

  inline constexpr calls_indirectly_key::value_t<void> assume_indirect_calls;

  template <typename First, typename... Rest>
  inline constexpr calls_indirectly_key::value_t<First, Rest...>
      assume_indirect_calls_to;

  template <>
  struct is_property_key<indirectly_callable_key> : std::true_type {};
  template <> struct is_property_key<calls_indirectly_key> : std::true_type {};
}
----

Before describing those properties in more detail, a couple of new terms are
introduced to simplify the extension specification:

Set of virtual member functions:: a group of virtual member functions which are
defined with the `indirectly_callable` property and with the same value of the
property parameter `SetId`. For simplicity, this will also be further referred
to as a _set_, or as a _set of virtual functions_.

Kernel declares a use of a set of virtual member functions:: a kernel is
considered to be declaring a use of a set of virtual member functions `SetIdA`
when it is submitted with `calls_indirectly` property with `SetIdA` included
into the property parameter `SetIds`. If `SetIdA` is not included into the
property parameter `SetIds`, or if a kernel is submitted without the property,
then it is *not* considered to be declaring a use of the set of virtual member
functions.

|===
|Property|Description
|`indirectly_callable`
|This is an alias to `indirectly_callable_in<void>`, please read the description
of the `indirectly_callable_in` property for full documentation.

This property is expected to be used in situations where application is not that
huge and/or complex and therefore doesn't care about having more than one set
of virtual functions.

Going forward, the document will only reference the `indirectly_callable_in`
property, but whatever is said about it also applies to the
`indirectly_callable` property because it is a simple alias.
|`indirectly_callable_in`
|The `indirectly_callable_in` property indicates that a virtual member function
is a device function, thus making it available to be called from SYCL kernel and
device functions. Should only be applied to virtual member functions and to do
so, function-style `SYCL_EXT_ONEAPI_FUNCTION_PROPERTY` macro should be used.

NOTE: This property affect a particular function and does not impact any of its
overrides in derived classes. If the whole hierarchy of overrides is expected
to be callable from a device, then each and every override should be marked with
the property.

Parameter `SetId` specifies a set of virtual member functions this function
belongs to and at the same time it defines a group of kernels, which can call
this function, it must be a C++ typename.

Calling a virtual member function from a kernel which does not declare use of a
set the virtual member function belongs to is an undefined behavior.

The property must appear on the first declaration of the function in the
translation unit. Redeclarations of the function may optionally be decorated
with the same property if the property argument is the same. The effect is the
same regardless of whether redeclarations are so decorated.

If a function is decorated with one of these properties in one translation unit,
any other translation unit that declares the same function must also decorate
the function with the same property (with the same argument). Otherwise the
program is considered ill-formed, but no diagnostic is required.

The programs that decorate the same function with multiple instances of the
property with different argument are ill formed.
|`assume_indirect_calls`
|This is an alias to `assume_indirect_calls_to<void>`, please read the
description of the `assume_indirect_calls_to` property for full documentation.

This property is expected to be used in situations where application is not that
huge and/or complex and therefore doesn't care about having more than one set
of virtual functions.

Going forward, the document will only reference the `assume_indirect_calls_to`
property, but whatever is said about it also applies to the
`assume_indirect_calls` property because it is a simple alias.
|`assume_indirect_calls_to`
|The `assume_indirect_calls_to` property indicates that a SYCL kernel function
may perform calls through virtual member functions and declares use of one or
more sets of virtual member functions.

Parameter `SetIds` specifies which sets of virtual member functions are
declared to be used by a kernel, it must be zero or more C\++ typenames.

Calling a virtual member function, which does not belong to any of sets of
virtual member functions declared to be used is an undefined behavior.

This property should be attached to a kernel if it contains a virtual member
function call in its call graph, even if the said function is never actually
called. If a kernel submitted without this property contains a virtual member
function call in its call graph, diagnostic should be emitted by an
implementation.
|===

If a kernel is submitted with the `assume_indirect_calls_to` property that
points to an empty set of virtual functions, a synchronous exception with the
`errc::invalid` error code should be thrown by an implementation.

Applying the `indirectly_callable_in` property to a SYCL Kernel function is
illegal and an implementation should produce a diagnostic for that.

Applying the `indirectly_callable_in` property to an arbitrary device function,
which is not a virtual member function has no effect.

NOTE: This behavior may be changed in either future version of this extension or
in another extensions.

Virtual member functions that are decorated with the `indirectly_callable_in`
property are considered to be device functions, i.e. they  must obey the
restrictions listed in section 5.4 of the core SYCL specification "Language
restrictions for device functions". Virtual member functions that are not
decorated with this attribute do not need to obey these restrictions, even if
other definitions of that virtual member function in other classes in the
inheritance hierarchy are decorated with the attribute.

[source,dpcpp]
----
using syclext = sycl::ext::oneapi::experimental;

struct set_A;
struct set_B;

class Foo {
public:
  // properties to functions should be applied using the macro:
  virtual SYCL_EXT_ONEAPI_FUNCTION_PROPERTY(
      syclext::indirectly_callable_in<set_A>) void
  foo() {}

  virtual SYCL_EXT_ONEAPI_FUNCTION_PROPERTY(
      syclext::indirectly_callable_in<set_A>) void
  bar();

  // first declaration must be annotated
  virtual SYCL_EXT_ONEAPI_FUNCTION_PROPERTY(
      syclext::indirectly_callable_in<set_B>) void
  baz();
};

// redeclarations may be annotated as well
void SYCL_EXT_ONEAPI_FUNCTION_PROPERTY(syclext::indirectly_callable_in<set_B>)
Foo::baz() {}

// but it is not required
Foo::bar() {}

int main() {
  sycl::queue q;
  // kernel calling virtual function should also be annotated:
  q.single_task(syclext::properties{syclext::assume_indirect_calls_to<set_A>},
      [=]() {
    Foo *ptr = /* ... */;
    ptr->bar()

    // Note: this kernel can only call 'Foo::foo' and 'Foo::bar' but not
    // 'Foo::baz', because the latter is declared within a different set.
  });
}
----

The main reason for virtual functions to be split into different sets is use of
optional kernel features in those virtual functions. It is explained in more
details in the next section. However, for simplicity purposes both properties
have aliases which allow to omit the set, thus using the default set:

[source,dpcpp]
----
using syclext = sycl::ext::oneapi::experimental;

struct set_A;

class Foo {
public:
  // This virtual member function belongs to the default set of virtual
  // functions.
  virtual SYCL_EXT_ONEAPI_FUNCTION_PROPERTY(syclext::indirectly_callable)
  void foo() {}

  virtual SYCL_EXT_ONEAPI_FUNCTION_PROPERTY(
      syclext::indirectly_callable_in<set_A>)
  void bar() {}
};

int main() {
  sycl::queue q;
  // This kernel declares a use of default set of virtual functions
  q.single_task(syclext::properties{syclext::assume_indirect_calls}, [=]() {
    Foo *ptr = /* ... */;
    ptr->bar()

    // Note: this kernel can only call 'Foo::foo' but not 'Foo::bar', because
    // the latter belongs to a different (non-default) set of virtual functions.
  });
}
----

NOTE: By definition of the `indirectly_callable` and `assume_indirect_calls`
properties above, the type `void` is used to denote the default set of
virtual functions.  Applications may also explicitly use the type `void` to
denote this default set of virtual functions when using `indirectly_callable_in`
and `assume_indirect_calls_to` properties.

=== Optional kernel features handling

The core SYCL specification (5.8 Attributes for device code) says the following
in the description of `device_has` attribute for SYCL kernels and non-kernel
device functions.

When the attribute is applied to a kernel:

> \... it causes the compiler to issue a diagnostic if the kernel (or any of the
> functions it calls) uses an optional feature that is associated with an aspect
> that is not listed in the attribute.

When the attribute is applied to a function:

> \... it causes the compiler to issue a diagnostic if the device function (or
> any of the functions it calls) uses an optional feature that is associated
> with an aspect that is not listed in the attribute.

Due to dynamic nature of virtual member functions, compiler in general case is
not able to perform static analysis of a call graph in order to understand which
exact virtual functions are called from which kernels.

Instead, information from the new properties is used by an implementation to
issue such diagnostic. When determining a set of aspects which are used by a
SYCL kernel function, an implementation must take into account all aspects which
are used by all virtual member functions included into all sets of virtual
member functions declared to be used by a kernel.

Therefore, if only default set of virtual functions is used by an application,
it means that every kernel which is submitted with the
`assume_indirect_calls_to` property is assumed to use _all_ virtual functions
marked with the `indirectly_callable_in` property. If some of those virtual
functions use optional kernel features and there are kernels which are supposed
to work on devices without support for those optional kernel features, then
virtual functions using them should be outlined into a separate set.

[source,dpcpp]
----
using syclext = sycl::ext::oneapi::experimental;

struct set_fp64;
struct set_fp16;

struct Foo {
  // This function uses 'fp64' aspect
  virtual SYCL_EXT_ONEAPI_FUNCTION_PROPERTY(
      syclext::indirectly_callable_in<set_fp64>)
  void f64() {
    double d = 3.14;
  }

  // This function uses 'fp16' aspect
  virtual SYCL_EXT_ONEAPI_FUNCTION_PROPERTY(
      syclext::indirectly_callable_in<set_fp16>)
  void f16() {
    sycl::half h = 2.71f;
  }
};

sycl::queue q;

q.single_task(syclext::properties{syclext::assume_indirect_calls_to<set_fp16>},
    [=]() [[sycl::device_has(sycl::aspect::fp64)]] {
  // Diagnostic is required for this kernel, because it is declared as only
  // using 'fp64' aspect, but it also uses virtual member functions from
  // "set_fp16", which includes 'Foo::f16' that uses 'fp16' aspect.
});

q.single_task(syclext::properties{syclext::assume_indirect_calls_to<set_fp64>},
    [=]() [[sycl::device_has()]] {
  // Diagnostic is required for this kernel, because it is declared as not
  // using any optional features, but it also uses virtual member functions from
  // "set_fp64", which includes 'Foo::f64' that uses 'fp64' aspect.
});

q.single_task(syclext::properties{syclext::assume_indirect_calls_to<set_fp64>},
    [=]() [[sycl::device_has(sycl::aspect::fp64)]] {
  // No diagnostic is required for this kernel, because list of declared aspects
  // matches list of used aspects. That includes virtual member functions from
  // "set_fp64", which includes 'Foo::f64' that uses 'fp64' aspect
});
----

Submitting a kernel with `assume_indirect_calls_to` property, which includes
virtual member functions that use optional kernel features to a device that
doesn't support them, should result in an exception at runtime, similar to how
it is defined by the core SYCL specification.

[source,dpcpp]
----
using syclext = sycl::ext::oneapi::experimental;

struct set_A;
struct set_B;

struct Foo {
  virtual SYCL_EXT_ONEAPI_FUNCTION_PROPERTY(
      syclext::indirectly_callable_in<set_A>)
  void foo() {
    double d = 3.14;
  }

  virtual SYCL_EXT_ONEAPI_FUNCTION_PROPERTY(
      syclext::indirectly_callable_in<set_B>)
  void bar() {}
};

int main() {
  sycl::queue q(/* device selector returns a device *without* fp64 support */);
  assert(!q.get_device().has(sycl::aspect::fp64));

  q.single_task(syclext::properties{syclext::assume_indirect_calls_to<set_A>},
      [=]() {
    // Exception is expected to be thrown, because target device doesn't support
    // fp64 aspect and it is used by 'Foo::foo' which is included into 'set_A'
  });

  q.single_task(syclext::properties{syclext::assume_indirect_calls_to<set_B>},
      [=]() {
    // No exceptions are expected, because 'set_B' doesn't bring any
    // requirements for optional kernel features.
  });
}
----

An implementation may not raise a compile time diagnostic or a run time
exception merely due to speculative compilation of a virtual member function for
a device when the application does not specify a use of virtual member functions
through the corresponding properties.

[source,dpcpp]
----
using syclext = sycl::ext::oneapi::experimental;

struct Foo {
  virtual SYCL_EXT_ONEAPI_FUNCTION_PROPERTY(syclext::indirectly_callable)
  void foo() {
    double d = 3.14;
  }

  virtual SYCL_EXT_ONEAPI_FUNCTION_PROPERTY(syclext::indirectly_callable)
  void bar() {}
};

int main() {
  sycl::queue q(/* device selector choosing a device *without* fp64 support */);
  assert(!q.get_device().has(sycl::aspect::fp64));

  auto *Storage = sycl::malloc_device<Foo>(1, q);

  q.single_task([=]() {
    // The kernel is not submitted with 'calls_indirectly' property and
    // therefore it is not considered to be using any of virtual member
    // functions of 'Foo'. This means that the object of 'Foo' can be
    // successfully created by this kernel, regardless of whether a target
    // device supports 'fp64' aspect which is used by 'Foo::foo'. No exceptions
    // are expected to be thrown.
    new (Storage) Foo;
  });
}
----

==== Interaction with `reqd_sub_group_size` attribute

The `reqd_sub_group_size` attribute is a bit of a special case comparing to
other optional kernel features, because it requires to compile a kernel in a
certain way, which may require special handling for all functions which are
called from it.

When the same function is called from two or more kernels with different
`reqd_sub_group_size` attribute, it may be required for the implementation to
duplicate that function to create different versions of it tailored to different
sub-group sizes. It can be done in a straightforward manner when operating on a
static call graph.

Virtual member functions are essentially called indirectly and pointers to them
are initialized just once when an object of a polymorphic class is being
created. Therefore, to support calling such virtual member function from two or
more kernels with different `reqd_sub_group_size`, each kernel may need to
receive a different pointer to a different version of a virtual member function.

To avoid possibly posing such multi-versioning requirements on implementations,
virtual member functions can only be called from kernels with _primary_
sub-group-size as defined by
link:../proposed/sycl_ext_oneapi_named_sub_group_sizes.asciidoc[
sycl_ext_oneapi_named_sub_group_sizes] extension, or otherwise behavior is
undefined.

NOTE: for implementations that don't support
`sycl_ext_oneapi_named_sub_group_sizes` extension, virtual member functions can
only be called from kernels which *don't* have `reqd_sub_group_size` attribute
set on them explicitly, or otherwise behavior is undefined.

=== Kernel bundles and device images

When an object of a polymorphic class is constructed, it stores a pointer to
virtual table, which points to its virtual member functions. Addresses of those
functions are accessible and valid only within a kernel bundle containing a
kernel which used to construct an object.

Performing calls to virtual member functions of an object constructed in a
kernel from a different kernel bundle is an undefined behavior.

[source,dpcpp]
----
using syclext = sycl::ext::oneapi::experimental;

struct Base {
  virtual SYCL_EXT_ONEAPI_FUNCTION_PROPERTY(syclext::indirectly_callable)
  void foo() {}
};

class Constructor;
class Use;

int main() {
  sycl::queue Q;

  Base *Obj = sycl::malloc_device<Base>(1, Q);
  int *Result = sycl::malloc_shared<int>(2, Q);

  auto bundleA
    = sycl::get_kernel_bundle<sycl::bundle_state::executable>(Q.get_context(),
        {sycl::get_kernel_id<Constructor>()});
  auto bundleB
    = sycl::get_kernel_bundle<sycl::bundle_state::executable>(Q.get_context(),
        {sycl::get_kernel_id<Use>()});


  Q.submit([&](sycl::handler &CGH) {
    CGH.use_kernel_bundle(bundleA);
    CGH.single_task<Constructor>([=]() {
      // Only placement new can be used within device functions.
      new (Obj) Base;
    });
  });

  Q.submit([&](sycl::handler &CGH) {
    CGH.use_kernel_bundle(bundleB);
    CGH.single_task<Use>(syclext::properties{syclext::assume_indirect_calls},
        [=]() {
      // Call to 'Base::foo' is an undefined behavior here, because 'Obj' was
      // constructed within kernel bundle `bundleA`
      Obj->foo();
    });
  });

  return 0;
}
----

If no explicit kernel bundle operations are performed by a program, it is
responsibility of a SYCL implementation to ensure that all kernels that use
virtual functions from the same set are implicitly put together into the same
kernel bundle to ensure that everything works correctly.

Note, however, that there are APIs which may require SYCL implementation to
re-compile a kernel bundle. For example, if a specialization constant value is
changed, SYCL implementation may need to re-compile a kernel bundle to embed
new value of a specialization constant into a device program. Such
re-compilation will invalidate all addresses of virtual functions which may
have been previously recorded in a constructed object making behavior of
virtual function calls through that object undefined.

Correct manipulation with specialization constants in kernels that also use
virtual functions requires advanced knowledge of implementation details and
therefore it is not recommended to use specialization constants together with
virtual functions.

== Example usage

[source,dpcpp]
----
#include <sycl/sycl.hpp>

using syclext = sycl::ext::oneapi::experimental;

class Base {
public:
  virtual SYCL_EXT_ONEAPI_FUNCTION_PROPERTY(syclext::indirectly_callable)
  int get_random_number() {
    return 4; // Chosen by fair dice roll. Guaranteed to be random
  }

  // Not considered to be a device function, can use full set of C++ features
  virtual int get_host_random_number() {
    throw std::runtime_error("Not Implemented");
  }
};

class Derived : public Base {
public:
  SYCL_EXT_ONEAPI_FUNCTION_PROPERTY(syclext::indirectly_callable)
  int get_random_number() override {
    return 221;
  }
};

int main() {
  sycl::queue Q;

  Base *Obj = sycl::malloc_device<Derived>(1, Q);
  int *Result = sycl::malloc_shared<int>(1, Q);

  Q.single_task([=]() {
    // Only placement new can be used within device functions.
    new (Obj) Derived;
  });

  auto props = syclext::properties{syclext::assume_indirect_calls};
  Q.single_task(props, [=]() {
    Base B;
    Result[0] = B.get_random_number();
  }).wait();
  assert(Result[0] == 4);

  Q.single_task(props, [=]() {
    Result[0] = Obj->get_random_number();
  }).wait();
  assert(Result[0] == 221);

  return 0;
}
----

== Issues

=== Handling of `reqd_sub_group_size` attribute

The extension allows virtual calls to be performed only from kernels with
_primary_ sub-group size, which is quite limiting and doesn't allow you to rely
on a particular sub-group size you want within a virtual function.

This is more of an implementation limitation, rather than a language problem,
because at both SPIR-V and SYCL levels we don't have a mechanism of assigning
`reqd_sub_group_size` attribute to on-kernel SYCL functions and considering
indirect nature of virtual functions, compiler may not be able to figure out
which kernels use which exact virtual functions.

By implementing some extra interfaces at SPIR-V and SYCL level we should be able
to improve the situation and lift some of the limitations around
`reqd_sub_group_size` attribute use together with virtual functions, but this
won't be a part of the initial language specification and implementation.

=== Interaction with specialization constants

Implementation of specialization constants may involve re-compilation and
therefore can easily break virtual functions functionality. Current extension
spec wording is to _discourage_ use of specialization constants together with
virtual functions, but not to completely prohibit. Should we be more clear here
maybe with the wording and make it stricter or more precise/formal?
