CUDA vs OpenCL math builtin precision

CUDA Guarantees

From Appendix E.1 of the CUDA C Programming Guide:

This section specifies the error bounds of each function when executed on the device and also when executed on the host in the case where the host does not supply the function.

The error bounds are generated from extensive but not exhaustive tests, so they are not guaranteed bounds.

In Section 11.1.5 of the CUDA C Best Practices Guide on Math Libraries and Section 11.1.6 of the CUDA C Best Practices Guide on Precision-related Compiler Flags, there are mentions of the precision of math built-ins.

Single Precision

The following table uses the following sources:

In addition to the following table, the CUDA documentation also includes:

Addition and multiplication are IEEE-compliant, so have a maximum error of 0.5 ulp.

The recommended way to round a single-precision floating-point operand to an integer, with the result being a single-precision floating-point number is rintf(), not roundf(). The reason is that roundf() maps to an 8-instruction sequence on the device, whereas rintf() maps to a single instruction. truncf(), ceilf(), and floorf() each map to a single instruction as well.

OpenCL defines ULP (units in last place) as:

If x is a real number that lies between two finite consecutive floating-point numbers a and b, without being equal to one of them, then ulp(x) = |b − a|, otherwise ulp(x) is the distance between the two non-equal finite floating-point numbers nearest x. Moreover, ulp(NaN) is NaN.

Maximum error is defined in the CUDA documentation as:

The maximum error is stated as the absolute value of the difference in ulps between a correctly rounded single-precision result and the result returned by the CUDA library function.

OpenCL Built-in

OpenCL Min Accuracy (ULP)

CUDA Built-in

CUDA Maximum Error (ULP)

x + y

Correctly rounded

x + y

0 ulp (IEEE-754 round-to-nearest-even)

x - y

Correctly rounded

N/A

N/A

x * y

Correctly rounded

x * y

0 ulp (IEEE-754 round-to-nearest-even)

1.0 / x

≤ 2.5 ulp

1.0 / x

0 ulp (if compute capability ≥ 2 when compiled with -prec-div=true), 1 ulp (full range) otherwise

x / y

≤ 2.5 ulp

x / y

0 ulp (if compute capability ≥ 2 when compiled with -prec-div=true), 2 ulp (full range) otherwise

acos

≤ 4 ulp

acosf

3 ulp (full range)

acospi

≤ 5 ulp

N/A

N/A

asin

≤ 4 ulp

asinf

4 ulp (full range)

asinpi

≤ 5 ulp

N/A

N/A

atan

≤ 5 ulp

atanf

2 ulp (full range)

atan2

≤ 6 ulp

atan2f

3 ulp (full range)

atanpi

≤ 5 ulp

N/A

N/A

atan2pi

≤ 6 ulp

N/A

N/A

acosh

≤ 4 ulp

acoshf

4 ulp (full range)

asinh

≤ 4 ulp

asinhf

3 ulp (full range)

atanh

≤ 5 ulp

atanhf

3 ulp (full range)

cbrt

≤ 2 ulp

cbrtf

1 ulp (full range)

ceil

Correctly rounded

ceilf

0 ulp (full range)

copysign

0 ulp

copysignf

Undocumented.

cos

≤ 4 ulp

cosf

2 ulp (full range)

cosh

≤ 4 ulp

coshf

2 ulp (full range)

cospi

≤ 4 ulp

cospi

2 ulp (full range)

N/A

N/A

cyl_bessel_i0f

6 ulp (full range)

N/A

N/A

cyl_bessel_i1f

6 ulp (full range)

erfc

≤ 16 ulp

erfcf

4 ulp (full range)

N/A

N/A

erfcinvf

2 ulp (full range)

N/A

N/A

erfcxf

4 ulp (full range)

N/A

N/A

erfinvf

2 ulp (full range)

erf

≤ 16 ulp

erff

2 ulp (full range)

exp

≤ 3 ulp

expf

2 ulp (full range)

exp2

≤ 3 ulp

exp2f

2 ulp (full range)

exp10

≤ 3 ulp

exp10f

2 ulp (full range)

expm1

≤ 3 ulp

expm1f

1 ulp (full range)

fabs

0 ulp

fabsf

Undocumented.

fdim

Correctly rounded

fdimf

0 ulp (full range)

floor

Correctly rounded

floorf

0 ulp (full range)

fma

Correctly rounded

fmaf

0 ulp (full range)

fmax

0 ulp

fmaxf

Undocumented.

fmin

0 ulp

fminf

Undocumented.

fmod

0 ulp

fmodf

0 ulp (full range)

fract

Correctly rounded

N/A

N/A

frexp

0 ulp

frexpf

0 ulp (full range)

hypot

≤ 4 ulp

hypotf

3 ulp (full range)

ilogb

0 ulp

ilogbf

0 ulp (full range)

N/A

N/A

j0f

9 ulp for abs(x) < 8, otherwise 2.2 x 10^(-6)

N/A

N/A

j1f

9 ulp for abs(x) < 8, otherwise 2.2 x 10^(-6)

N/A

N/A

jnf

For n = 128, 2.2 x 10^(-6)

ldexp

Correctly rounded

ldexpf

0 ulp (full range)

N/A

N/A

lgammaf

6 ulp (outside interval -10.001 ... -2.264; larger inside)

log

≤ 3 ulp

logf

1 ulp (full range)

log2

≤ 3 ulp

log2f

1 ulp (full range)

log10

≤ 3 ulp

log10f

2 ulp (full range)

log1p

≤ 2 ulp

log1pf

1 ulp (full range)

logb

0 ulp

logbf

0 ulp (full range)

N/A

N/A

lrintf

0 ulp (full range)

N/A

N/A

lroundf

0 ulp (full range)

N/A

N/A

llrintf

0 ulp (full range)

N/A

N/A

llroundf

0 ulp (full range)

mad

Any value allowed (infinite ulp)

N/A

N/A

maxmag

0 ulp

N/A

N/A

minmag

0 ulp

N/A

N/A

modf

0 ulp

modff

0 ulp (full range)

nan

0 ulp

nanf

Undocumented.

N/A

N/A

nearbyintf

0 ulp (full range)

nextafter

0 ulp

nextafterf

Undocumented.

N/A

N/A

normf

4 ulp (full range)

N/A

N/A

normcdff

5 ulp (full range)

N/A

N/A

normcdfinvf

5 ulp (full range)

N/A

N/A

norm3df

3 ulp (full range)

N/A

N/A

norm4df

3 ulp (full range)

pow(x, y)

≤ 16 ulp

powf

8 ulp (full range)

pown(x, y)

≤ 16 ulp

N/A

N/A

powr(x, y)

≤ 16 ulp

N/A

N/A

N/A

N/A

rcbrtf

1 ulp (full range)

N/A

N/A

rhypot

2 ulp (full range)

N/A

N/A

rnormf

3 ulp (full range)

N/A

N/A

rnorm3df

2 ulp (full range)

N/A

N/A

rnorm4df

2 ulp (full range)

remainder

0 ulp

remainderf

0 ulp (full range)

remquo

0 ulp

remquof

0 ulp (full range)

rint

Correctly rounded

rintf

0 ulp (full range)

rootn

≤ 16 ulp

N/A

N/A

round

Correctly rounded

roundf

0 ulp (full range)

rsqrt

≤ 2 ulp

rsqrtf

2 ulp (full range) (applies to 1 / sqrtf(x) only when converted to rsqrtf by compiler)

N/A

N/A

scalbnf

0 ulp (full range)

N/A

N/A

scalblnf

0 ulp (full range)

sin

≤ 4 ulp

sinf

2 ulp (full range)

sincos

≤ 4 ulp for sine and cosine values

sincosf

2 ulp (full range)

N/A

N/A

sincospif

2 ulp (full range)

sinh

≤ 4 ulp

sinhf

3 ulp (full range)

sinpi

≤ 4 ulp

sinpif

2 ulp (full range)

sqrt

≤ 3 ulp

sqrtf

0 ulp (when compiled with -prec-sqrt=true) otherwise 1 ulp if compute capability ≥ 5.2 and 3 ulp otherwise.

tan

≤ 5 ulp

tanf

4 ulp (full range)

tanh

≤ 5 ulp

tanhf

2 ulp (full range)

tanpi

≤ 6 ulp

N/A

N/A

tgamma

≤ 16 ulp

tgammaf

11 ulp (full range)

trunc

Correctly rounded

truncf

0 ulp (full range)

N/A

N/A

y0f

9 ulp for abs(x) < 8, otherwise 2.2 x 10^(-6)

N/A

N/A

y1f

9 ulp for abs(x) < 8, otherwise 2.2 x 10^(-6)

N/A

N/A

ynf

ceil(2 + 2.5n) for abs(x) < n, otherwise 2.2 x 10^(-6)

N/A

N/A

isfinite

N/A

N/A

N/A

isinf

N/A

N/A

N/A

isnan

N/A

N/A

N/A

signbit

N/A

OpenCL’s native_ math built-ins map to the same CUDA built-in as the equivalent non-native_ OpenCL built-in and the precision is implementation-defined:

In section 7.4 of the OpenCL 2.1 Specification, mad has a different requirement, namely:

Implemented either as a correctly rounded fma or as a multiply followed by an add both of which are correctly rounded.

Precision of SPIR-V math instructions for use in an OpenCL environment, can be found in this document.

Double Precision

The following table uses the following sources:

CUDA defines maximum error in the same way as for single precision, and also includes:

The recommended way to round a double-precision floating-point operand to an integer, with the result being a double-precision floating-point number is rint(), not round(). The reason is that round() maps to an 8-instruction sequence on the device, whereas rint() maps to a single instruction. trunc(), ceil(), and floor() each map to a single instruction as well.

Only differences from single precision are included. There are only changes to 1.0 / x, x / y and sqrt from OpenCL. All built-in names changed for CUDA and many precisions too.

OpenCL Built-in

OpenCL Min Accuracy (ULP)

CUDA Built-in

CUDA Maximum Error (ULP)

x + y

Correctly rounded

x + y

0 ulp (IEEE-754 round-to-nearest-even)

x - y

Correctly rounded

N/A

N/A

x * y

Correctly rounded

x * y

0 ulp (IEEE-754 round-to-nearest-even)

1.0 / x

Correctly rounded

1.0 / x

0 ulp (IEEE-754 round-to-nearest-even

x / y

Correctly rounded

x / y

0 ulp (IEEE-754 round-to-nearest-even)

acos

≤ 4 ulp

acos

1 ulp (full range)

acospi

≤ 5 ulp

N/A

N/A

asin

≤ 4 ulp

asin

2 ulp (full range)

asinpi

≤ 5 ulp

N/A

N/A

atan

≤ 5 ulp

atan

2 ulp (full range)

atan2

≤ 6 ulp

atan2

2 ulp (full range)

atanpi

≤ 5 ulp

N/A

N/A

atan2pi

≤ 6 ulp

N/A

N/A

acosh

≤ 4 ulp

acosh

2 ulp (full range)

asinh

≤ 4 ulp

asinh

2 ulp (full range)

atanh

≤ 5 ulp

atanh

2 ulp (full range)

cbrt

≤ 2 ulp

cbrt

1 ulp (full range)

ceil

Correctly rounded

ceil

0 ulp (full range)

copysign

0 ulp

copysign

Undocumented.

cos

≤ 4 ulp

cos

1 ulp (full range)

cosh

≤ 4 ulp

cosh

1 ulp (full range)

cospi

≤ 4 ulp

cospi

1 ulp (full range)

N/A

N/A

cyl_bessel_i0

6 ulp (full range)

N/A

N/A

cyl_bessel_i1

6 ulp (full range)

erfc

≤ 16 ulp

erfc

4 ulp (full range)

N/A

N/A

erfcinv

6 ulp (full range)

N/A

N/A

erfcx

3 ulp (full range)

N/A

N/A

erfinv

5 ulp (full range)

erf

≤ 16 ulp

erf

2 ulp (full range)

exp

≤ 3 ulp

exp

1 ulp (full range)

exp2

≤ 3 ulp

exp2

1 ulp (full range)

exp10

≤ 3 ulp

exp10

1 ulp (full range)

expm1

≤ 3 ulp

expm1

1 ulp (full range)

fabs

0 ulp

fabs

Undocumented.

fdim

Correctly rounded

fdim

0 ulp (full range)

floor

Correctly rounded

floor

0 ulp (full range)

fma

Correctly rounded

fma

0 ulp (IEEE-754 round-to-nearest-even)

fmax

0 ulp

fmax

Undocumented.

fmin

0 ulp

fmin

Undocumented.

fmod

0 ulp

fmod

0 ulp (full range)

fract

Correctly rounded

N/A

N/A

frexp

0 ulp

frexp

0 ulp (full range)

hypot

≤ 4 ulp

hypot

2 ulp (full range)

ilogb

0 ulp

ilogb

0 ulp (full range)

N/A

N/A

j0

7 ulp for abs(x) < 8, otherwise 5 x 10^(-12)

N/A

N/A

j1

7 ulp for abs(x) < 8, otherwise 5 x 10^(-12)

N/A

N/A

jn

For n = 128, 5 x 10^(-12)

ldexp

Correctly rounded

ldexp

0 ulp (full range)

N/A

N/A

lgamma

4 ulp (outside interval -11.0001 ... -2.2637; larger inside)

log

≤ 3 ulp

log

1 ulp (full range)

log2

≤ 3 ulp

log2

1 ulp (full range)

log10

≤ 3 ulp

log10

1 ulp (full range)

log1p

≤ 2 ulp

log1p

1 ulp (full range)

logb

0 ulp

logb

0 ulp (full range)

N/A

N/A

lrint

0 ulp (full range)

N/A

N/A

lround

0 ulp (full range)

N/A

N/A

llrint

0 ulp (full range)

N/A

N/A

llround

0 ulp (full range)

mad

Any value allowed (infinite ulp)

N/A

N/A

maxmag

0 ulp

N/A

N/A

minmag

0 ulp

N/A

N/A

modf

0 ulp

mod (might be called modf, the documentation is inconsistent)

0 ulp (full range)

nan

0 ulp

nan

Undocumented.

N/A

N/A

nearbyint

0 ulp (full range)

nextafter

0 ulp

nextafter

Undocumented.

N/A

N/A

norm

3 ulp (full range)

N/A

N/A

normcdf

5 ulp (full range)

N/A

N/A

normcdfinv

7 ulp (full range)

N/A

N/A

norm3d

2 ulp (full range)

N/A

N/A

norm4d

2 ulp (full range)

pow(x, y)

≤ 16 ulp

pow

2 ulp (full range)

pown(x, y)

≤ 16 ulp

N/A

N/A

powr(x, y)

≤ 16 ulp

N/A

N/A

N/A

N/A

rcbrt

1 ulp (full range)

N/A

N/A

rhypot

1 ulp (full range)

N/A

N/A

rnorm

2 ulp (full range)

N/A

N/A

rnorm3d

1 ulp (full range)

N/A

N/A

rnorm4d

1 ulp (full range)

remainder

0 ulp

remainder

0 ulp (full range)

remquo

0 ulp

remquo

0 ulp (full range)

rint

Correctly rounded

rint

0 ulp (full range)

rootn

≤ 16 ulp

N/A

N/A

round

Correctly rounded

round

0 ulp (full range)

rsqrt

≤ 2 ulp

rsqrt

1 ulp (full range)

N/A

N/A

scalbn

0 ulp (full range)

N/A

N/A

scalbln

0 ulp (full range)

sin

≤ 4 ulp

sin

1 ulp (full range)

sincos

≤ 4 ulp for sine and cosine values

sincos

1 ulp (full range)

N/A

N/A

sincospi

1 ulp (full range)

sinh

≤ 4 ulp

sinh

1 ulp (full range)

sinpi

≤ 4 ulp

sinpi

1 ulp (full range)

sqrt

Correctly rounded

sqrt

0 ulp (IEEE-754 round-to-nearest-even)

tan

≤ 5 ulp

tan

2 ulp (full range)

tanh

≤ 5 ulp

tanh

1 ulp (full range)

tanpi

≤ 6 ulp

N/A

N/A

tgamma

≤ 16 ulp

tgamma

8 ulp (full range)

trunc

Correctly rounded

trunc

0 ulp (full range)

N/A

N/A

y0

7 ulp for abs(x) < 8, otherwise 5 x 10^(-12)

N/A

N/A

y1

7 ulp for abs(x) < 8, otherwise 5 x 10^(-12)

N/A

N/A

yn

For abs(x) > 1.5n, otherwise 5 x 10^(-12)

N/A

N/A

isfinite

N/A

N/A

N/A

isinf

N/A

N/A

N/A

isnan

N/A

N/A

N/A

signbit

N/A

Half Precision

The following tables uses the following sources:

CUDA doesn’t specify the ULP values for any of its half precision math builtins:

OpenCL Built-in

OpenCL Min Accuracy (ULP)

CUDA Built-in

CUDA Maximum Error (ULP)

N/A

N/A

_hadd

Undocumented (only specifies “round-to-nearest-even mode”)

N/A

N/A

_hadd_sat

Undocumented (only specifies “round-to-nearest-even mode”)

N/A

N/A

hceil

Undocumented

half_cos

≤ 8192 ulp

hcos

Undocumented (only specifies “round-to-nearest-even mode”)

half_divide

≤ 8192 ulp

_hdiv

Undocumented (only specifies “round-to-nearest mode”)

N/A

N/A

_heq

Undocumented

N/A

N/A

_hequ

Undocumented

half_exp

≤ 8192 ulp

hexp

Undocumented (only specifies “round-to-nearest-even mode”)

half_exp2

≤ 8192 ulp

hexp2

Undocumented (only specifies “round-to-nearest-even mode”)

half_exp10

≤ 8192 ulp

hexp10

Undocumented (only specifies “round-to-nearest-even mode”)

N/A

N/A

hfloor

Undocumented

N/A

N/A

_hfma

Undocumented (only specifies “round-to-nearest-even mode”)

N/A

N/A

_hfma_sat

Undocumented (only specifies “round-to-nearest-even mode”)

N/A

N/A

_hge

Undocumented

N/A

N/A

_hgeu

Undocumented

N/A

N/A

_hgt

Undocumented

N/A

N/A

_hgtu

Undocumented

N/A

N/A

_hisinf

Undocumented

N/A

N/A

_hisnan

Undocumented

N/A

N/A

_hle

Undocumented

N/A

N/A

_hleu

Undocumented

half_log

≤ 8192 ulp

hlog

Undocumented (only specifies “round-to-nearest-even mode”)

half_log2

≤ 8192 ulp

hlog2

Undocumented (only specifies “round-to-nearest-even mode”)

half_log10

≤ 8192 ulp

hlog10

Undocumented (only specifies “round-to-nearest-even mode”)

N/A

N/A

_hlt

Undocumented

N/A

N/A

_hltu

Undocumented

N/A

N/A

_hmul

Undocumented (only specifies “round-to-nearest mode”)

N/A

N/A

_hmul_sat

Undocumented (only specifies “round-to-nearest mode”)

N/A

N/A

_hneg

Undocumented

N/A

N/A

_hne

Undocumented

N/A

N/A

_hneu

Undocumented

half_powr

≤ 8192 ulp

N/A

N/A

half_recip

≤ 8192 ulp

hrcp

Undocumented (only specifies “round-to-nearest-even mode”)

N/A

N/A

hrint

Undocumented (only specifies “halfway cases rounded to nearest even integer value”)

half_rsqrt

≤ 8192 ulp

hrqsrt

Undocumented (only specifies “round-to-nearest mode”)

half_sin

≤ 8192 ulp

hsin

Undocumented (only specifies “round-to-nearest-even mode”)

half_sqrt

≤ 8192 ulp

hsqrt

Undocumented (only specifies “round-to-nearest-even mode”)

N/A

N/A

_hsub

Undocumented (only specifies “round-to-nearest mode”)

N/A

N/A

_hsub_sat

Undocumented (only specifies “round-to-nearest mode”)

half_tan

≤ 8192 ulp

N/A

N/A

N/A

N/A

htrunc

Undocumented

CUDA also defines math builtins that operate on a half2 type to which there is no OpenCL parallel:

CUDA Built-in

CUDA Maximum Error (ULP)

_h2div

Undocumented (only specifies “round-to-nearest mode”)

_hadd2_sat

Undocumented (only specifies “round-to-nearest-even mode”)

_hadd2

Undocumented (only specifies “round-to-nearest-even mode”)

_hbeg2

Undocumented

_hbegu2

Undocumented

_hbge2

Undocumented

_hbgeu2

Undocumented

_hbgt2

Undocumented

_hbgtu2

Undocumented

_hble2

Undocumented

_hbleu2

Undocumented

_hblt2

Undocumented

_hbltu2

Undocumented

_hbne2

Undocumented

_hbneu2

Undocumented

_heq2

Undocumented

_hequ2

Undocumented

_hfma2_sat

Undocumented (only specifies “round-to-nearest-even mode”)

_hfma2

Undocumented (only specifies “round-to-nearest-even mode”)

_hge2

Undocumented

_hgeu2

Undocumented

_hgt2

Undocumented

_hgtu2

Undocumented

_hisnan2

Undocumented

_hle2

Undocumented

_hleu2

Undocumented

_hlt2

Undocumented

_hltu2

Undocumented

_hmul2_sat

Undocumented (only specifies “round-to-nearest-even mode”)

_hmul2

Undocumented (only specifies “round-to-nearest-even mode”)

_hne2

Undocumented

_hneg2

Undocumented

_hneu2

Undocumented

_hsub2_sat

Undocumented (only specifies “round-to-nearest-even mode”)

_hsub2

Undocumented (only specifies “round-to-nearest-even mode”)

h2ceil

Undocumented

h2cos

Undocumented (only specifies “round-to-nearest-even mode”)

h2exp10

Undocumented (only specifies “round-to-nearest-even mode”)

h2exp2

Undocumented (only specifies “round-to-nearest-even mode”)

h2exp

Undocumented (only specifies “round-to-nearest mode”)

h2floor

Undocumented

h2log10

Undocumented (only specifies “round-to-nearest-even mode”)

h2log2

Undocumented (only specifies “round-to-nearest-even mode”)

h2log

Undocumented (only specifies “round-to-nearest-even mode”)

h2rcp

Undocumented (only specifies “round-to-nearest-even mode”)

h2rint

Undocumented (only specifies “halfway cases rounded to nearest even integer value”)

h2rsqrt

Undocumented (only specifies “round-to-nearest-even mode”)

h2trunc

Undocumented

Further, CUDA defines conversion and data movement functions:

CUDA Built-in

CUDA Maximum Error (ULP)

__float22half2_rn

Undocumented (only specifies “round-to-nearest-even mode”)

__float2half2_rn

Undocumented (only specifies “round-to-nearest-even mode”)

__float2half_rd

Undocumented (only specifies “round-down mode”)

__float2half_rn

Undocumented (only specifies “round-to-nearest-even mode”)

__float2half_ru

Undocumented (only specifies “round-up mode”)

__float2half_rz

Undocumented (only specifies “round-towards-zero mode”)

__float2half

Undocumented (only specifies “round-to-nearest-even mode”)

__floats2half2_rn

Undocumented (only specifies “round-to-nearest-even mode”)

__half22float2

Undocumented

__half2float

Undocumented

__half2half2

Undocumented

__half2int_rd

Undocumented (only specifies “round-down mode”)

__half2int_rn

Undocumented (only specifies “round-to-nearest-even mode”)

__half2int_ru

Undocumented (only specifies “round-up mode”)

__half2int_rz

Undocumented (only specifies “round-towards-zero mode”)

__half2ll_rd

Undocumented (only specifies “round-down mode”)

__half2ll_rn

Undocumented (only specifies “round-to-nearest-even mode”)

__half2ll_ru

Undocumented (only specifies “round-up mode”)

__half2ll_rz

Undocumented (only specifies “round-towards-zero mode”)

__half2short_rd

Undocumented (only specifies “round-down mode”)

__half2short_rn

Undocumented (only specifies “round-to-nearest-even mode”)

__half2short_ru

Undocumented (only specifies “round-up mode”)

__half2short_rz

Undocumented (only specifies “round-towards-zero mode”)

__half2uint_rd

Undocumented (only specifies “round-down mode”)

__half2uint_rn

Undocumented (only specifies “round-to-nearest-even mode”)

__half2uint_ru

Undocumented (only specifies “round-up mode”)

__half2uint_rz

Undocumented (only specifies “round-towards-zero mode”)

__half2ull_rd

Undocumented (only specifies “round-down mode”)

__half2ull_rn

Undocumented (only specifies “round-to-nearest-even mode”)

__half2ull_ru

Undocumented (only specifies “round-up mode”)

__half2ull_rz

Undocumented (only specifies “round-towards-zero mode”)

__half2ushort_rd

Undocumented (only specifies “round-down mode”)

__half2ushort_rn

Undocumented (only specifies “round-to-nearest-even mode”)

__half2ushort_ru

Undocumented (only specifies “round-up mode”)

__half2ushort_rz

Undocumented (only specifies “round-towards-zero mode”)

__half_as_short

Undocumented

__half_as_ushort

Undocumented

__halves2half2

Undocumented

__high2float

Undocumented

__high2half2

Undocumented

__high2half

Undocumented

__highs2half2

Undocumented

__int2half_rd

Undocumented (only specifies “round-down mode”)

__int2half_rn

Undocumented (only specifies “round-to-nearest-even mode”)

__int2half_ru

Undocumented (only specifies “round-up mode”)

__int2half_rz

Undocumented (only specifies “round-towards-zero mode”)

__ll2half_rd

Undocumented (only specifies “round-down mode”)

__ll2half_rn

Undocumented (only specifies “round-to-nearest-even mode”)

__ll2half_ru

Undocumented (only specifies “round-up mode”)

__ll2half_rz

Undocumented (only specifies “round-towards-zero mode”)

__low2float

Undocumented

__low2half2

Undocumented

__low2half

Undocumented

__lowhigh2highlow

Undocumented

__lows2half2

Undocumented

__shfl_down_sync

Undocumented

__shfl_sync

Undocumented

__shfl_up_sync

Undocumented

__shfl_xor_sync

Undocumented

__short2half_rd

Undocumented (only specifies “round-down mode”)

__short2half_rn

Undocumented (only specifies “round-to-nearest-even mode”)

__short2half_ru

Undocumented (only specifies “round-up mode”)

__short2half_rz

Undocumented (only specifies “round-towards-zero mode”)

__short_as_half

Undocumented

__uint2half_rd

Undocumented (only specifies “round-down mode”)

__uint2half_rn

Undocumented (only specifies “round-to-nearest-even mode”)

__uint2half_ru

Undocumented (only specifies “round-up mode”)

__uint2half_rz

Undocumented (only specifies “round-towards-zero mode”)

__ull2half_rd

Undocumented (only specifies “round-down mode”)

__ull2half_rn

Undocumented (only specifies “round-to-nearest-even mode”)

__ull2half_ru

Undocumented (only specifies “round-up mode”)

__ull2half_rz

Undocumented (only specifies “round-towards-zero mode”)

__ushort2half_rd

Undocumented (only specifies “round-down mode”)

__ushort2half_rn

Undocumented (only specifies “round-to-nearest-even mode”)

__ushort2half_ru

Undocumented (only specifies “round-up mode”)

__ushort2half_rz

Undocumented (only specifies “round-towards-zero mode”)

__ushort_as_half

Undocumented