Deep Neural Network Library (DNNL)
1.2.0

Performance library for Deep Learning

Eltwise

The eltwise primitive applies an operation to every element of the tensor:

\[ dst(\overline{x}) = Operation(src(\overline{x})), \]

where \(\overline{x} = (x_n, .., x_0)\).

The following operations are supported:

Operation | DNNL algorithm kind | Formula |
---|---|---|

abs | dnnl_eltwise_abs | \( f(x) = \begin{cases} x & \text{if}\ x > 0 \\ -x & \text{if}\ x \leq 0 \end{cases} \) |

bounded_relu | dnnl_eltwise_bounded_relu | \( f(x) = \begin{cases} \alpha & \text{if}\ x > \alpha, \alpha \geq 0 \\ x & \text{if}\ 0 < x \leq \alpha \\ 0 & \text{if}\ x \leq 0 \end{cases} \) |

clip | dnnl_eltwise_clip | \( f(x) = \begin{cases} \beta & \text{if}\ x > \beta, \beta \geq \alpha \\ x & \text{if}\ \alpha < x \leq \beta \\ \alpha & \text{if}\ x \leq \alpha \end{cases} \) |

elu | dnnl_eltwise_elu | \( f(x) = \begin{cases} x & \text{if}\ x > 0 \\ \alpha (e^x - 1) & \text{if}\ x \leq 0 \end{cases} \) |

exp | dnnl_eltwise_exp | \( f(x) = e^x \) |

gelu | dnnl_eltwise_gelu | \( f(x) = 0.5 x (1 + tanh[\sqrt{\frac{2}{\pi}} (x + 0.044715 x^3)])\) |

linear | dnnl_eltwise_linear | \( f(x) = \alpha x + \beta \) |

log | dnnl_eltwise_log | \( f(x) = \log_{e}{x} \) |

logistic | dnnl_eltwise_logistic | \( f(x) = \frac{1}{1+e^{-x}} \) |

relu | dnnl_eltwise_relu | \( f(x) = \begin{cases} x & \text{if}\ x > 0 \\ \alpha x & \text{if}\ x \leq 0 \end{cases} \) |

soft_relu | dnnl_eltwise_soft_relu | \( f(x) = \log_{e}(1+e^x) \) |

sqrt | dnnl_eltwise_sqrt | \( f(x) = \sqrt{x} \) |

square | dnnl_eltwise_square | \( f(x) = x^2 \) |

swish | dnnl_eltwise_swish | \( f(x) = \frac{x}{1+e^{-\alpha x}} \) |

tanh | dnnl_eltwise_tanh | \( f(x) = \tanh{x} \) |

There is no difference between the dnnl_forward_training and dnnl_forward_inference propagation kinds.

The backward propagation computes \(diff\_src(\overline{x})\), based on \(diff\_dst(\overline{x})\) and \(src(\overline{x})\).

- All eltwise primitives have a common initialization function (e.g., dnnl::eltwise_forward::desc::desc()) which takes both parameters \(\alpha\), and \(\beta\). These parameters are ignored if they are unused.
- The memory format and data type for
`src`

and`dst`

are assumed to be the same, and in the API are typically referred as`data`

(e.g., see`data_desc`

in dnnl::eltwise_forward::desc::desc()). The same holds for`diff_src`

and`diff_dst`

. The corresponding memory descriptors are referred to as`diff_data_desc`

. - Both forward and backward propagation support in-place operations, meaning that
`src`

can be used as input and output for forward propagation, and`diff_dst`

can be used as input and output for backward propagation. In case of in-place operation, the original data will be overwritten. - For some operations it might be performance beneficial to compute backward propagation based on \(dst(\overline{x})\), rather than on \(src(\overline{x})\). However, for some other operations this is simply impossible. So for generality the library always requires \(src\).

- Note
- For the ReLU operation with \(\alpha = 0\), \(dst\) can be used instead of \(src\) and \(dst\) when backward propagation is computed. This enables several performance optimizations (see the tips below).

The eltwise primitive supports the following combinations of data types:

Propagation | Source / Destination | Intermediate data type |
---|---|---|

forward / backward | f32, bf16 | f32 |

forward | f16 | f16 |

forward | s32 / s8 / u8 | f32 |

- Warning
- There might be hardware and/or implementation specific restrictions. Check Implementation Limitations section below.

Here the intermediate data type means that the values coming in are first converted to the intermediate data type, then the operation is applied, and finally the result is converted to the output data type.

The eltwise primitive works with arbitrary data tensors. There is no special meaning associated with any logical dimensions.

The eltwise primitive doesn't support any post-ops or attributes.

- Refer to Data Types for limitations related to data types support.

- For backward propagation, use the same memory format for
`src`

,`diff_dst`

, and`diff_src`

(the format of the`diff_dst`

and`diff_src`

are always the same because of the API). Different formats are functionally supported but lead to highly suboptimal performance. - Use in-place operations whenever possible.
- As mentioned above for the ReLU operation with \(\alpha = 0\), one can use the \(dst\) tensor instead of \(src\). This enables the following potential optimizations for training:
- ReLU can be safely done in-place.
- Moreover, ReLU can be fused as a post-op with the previous operation if that operation doesn't require its \(dst\) to compute the backward propagation (e.g., if the convolution operation satisfies these conditions).