# Customized Operators

Public API for extended XPU operators is provided by the `itex.ops` namespace. The extended API provides better performance than the original public API.

## `itex.ops.AdamWithWeightDecayOptimizer`

This optimizer implements the Adam algorithm with weight decay.

```itex.ops.AdamWithWeightDecayOptimizer(
weight_decay=0.001, learning_rate=0.001, beta_1=0.9, beta_2=0.999,
)
```

This is an implementation of the AdamW optimizer described in “Decoupled Weight Decay Regularization” by Loshch ilov & Hutter (pdf). This Python API `itex.ops.AdamWithWeightDecayOptimizer` replaces tfa.optimizers.AdamW.

For example:

```step = tf.Variable(0, trainable=False)
schedule = tf.optimizers.schedules.PiecewiseConstantDecay(
[10000, 15000], [1e-0, 1e-1, 1e-2])
# lr and wd can be a function or a tensor
lr = 1e-1 * schedule(step)
wd = lambda: 1e-4 * schedule(step)

# ...

```

## `itex.ops.LAMBOptimizer`

This optimizer implements the LAMB algorithm.

```itex.ops.LAMBOptimizer(
learning_rate=0.001, beta_1=0.9, beta_2=0.999,
epsilon=1e-06, weight_decay=0.001, name='LAMB', **kwargs
)
```

This is an implementation of the LAMB optimizer described in “Large Batch Optimization for Deep Learning: Training BERT in 76 minutes” (pdf). This Python API `itex.ops.LAMBOptimizer` replaces tfa.optimizers.LAMB.

For example:

```step = tf.Variable(0, trainable=False)
schedule = tf.optimizers.schedules.PiecewiseConstantDecay(
[10000, 15000], [1e-0, 1e-1, 1e-2])
# lr and wd can be a function or a tensor
lr = 1e-1 * schedule(step)
wd = lambda: 1e-4 * schedule(step)

# ...

optimizer = itex.ops.LAMBOptimizer(learning_rate=lr, weight_decay=wd)
```

## `itex.ops.LayerNormalization`

```itex.ops.LayerNormalization(
axis=-1, epsilon=0.001, center=True, scale=True,
beta_initializer='zeros', gamma_initializer='ones',
beta_regularizer=None, gamma_regularizer=None, beta_constraint=None,
gamma_constraint=None, **kwargs
)
```

Normalize the activations of the previous layer for each given example in a batch independently, rather than across a batch like Batch Normalization. This applies a transformation that maintains the mean activation within each example close to 0, and the activation standard deviation close to 1. This python API `itex.ops.LayerNormalization` replaces tf.keras.layers.LayerNormalization.

For example:

```>>> import intel_extension_for_tensorflow as itex
>>> data = tf.constant(np.arange(10).reshape(5, 2) * 10, dtype=tf.float32)
>>> layer = itex.ops.LayerNormalization(axis=1)
>>> output = layer(data, training=False)
>>> print(output)
tf.Tensor(
[[-0.99998  0.99998]
[-0.99998  0.99998]
[-0.99998  0.99998]
[-0.99998  0.99998]
[-0.99998  0.99998]], shape=(5, 2), dtype=float32)
```

## `itex.ops.GroupNormalization`

```itex.ops.GroupNormalization(
groups=32,
axis=-1,
epsilon=1e-3,
center=True,
scale=True,
beta_initializer="zeros",
gamma_initializer="ones",
beta_regularizer=None,
gamma_regularizer=None,
beta_constraint=None,
gamma_constraint=None,
**kwargs
)
```

Group Normalization divides the channels into groups and computes within each group the mean and variance for normalization. Empirically, its accuracy is more stable than batch norm in a wide range of small batch sizes, if learning rate is adjusted linearly with batch sizes. This python API `itex.ops.GroupNormalization` replaces tf.keras.layers.GroupNormalization. Note, ITEX provide faster GPU implement for 4d input and axis=-1, others is same as original keras.

For example:

```>>> import tensorflow as tf
>>> import intel_extension_for_tensorflow as itex
>>> data = tf.random.normal((1, 8, 8, 32))
>>> layer = itex.ops.GroupNormalization(axis=-1)
>>> output = layer(data)
```

## `itex.ops.gelu`

Applies the Gaussian error linear unit (GELU) activation function.

```itex.ops.gelu(
features, approximate=False, name=None
)
```

Gaussian error linear unit (`GELU`) computes `x * P(X <= x)`, where `P(X) ~ N(0, 1)`. The (GELU) nonlinearity weights inputs by their value, rather than gating inputs by their sign as in `ReLU`. This Python API `itex.ops.gelu` replaces tf.nn.gelu.

For example:

```>>> import intel_extension_for_tensorflow as itex
>>> x = tf.constant([-3.0, -1.0, 0.0, 1.0, 3.0], dtype=tf.float32)
>>> y = itex.ops.gelu(x)
>>> y.numpy()
array([-0.00404969, -0.15865526,  0.        ,  0.8413447 ,  2.9959502 ],
dtype=float32)
>>> y = itex.ops.gelu(x, approximate=True)
>>> y.numpy()
array([-0.00363725, -0.158808  ,  0.        ,  0.841192  ,  2.9963627 ],
dtype=float32)
```

## `itex.ops.ItexLSTM`

Long Short-Term Memory layer (first proposed in Hochreiter & Schmidhuber, 1997), this python API `itex.ops.ItexLSTM` is semantically the same as tf.keras.layers.LSTM.

```itex.ops.ItexLSTM(
200, activation='tanh',
recurrent_activation='sigmoid',
use_bias=True,
kernel_initializer='glorot_uniform',
recurrent_initializer='orthogonal',
bias_initializer='zeros', **kwargs
)
```

Based on available runtime hardware and constraints, this layer will choose different implementations (ITEX-based or fallback-TensorFlow) to maximize the performance.
If a GPU is available and all the arguments to the layer meet the requirements of the ITEX kernel (see below for details), the layer will use a fast Intel® Extension for TensorFlow* implementation. The requirements to use the ITEX implementation are:

1. `activation` == `tanh`

2. `recurrent_activation` == `sigmoid`

3. `use_bias` is `True`

5. Eager execution is enabled in the outermost context.

For example:

```>>> import intel_extension_for_tensorflow as itex
>>> inputs = tf.random.normal([32, 10, 8])
>>> lstm = itex.ops.ItexLSTM(4)
>>> output = lstm(inputs)
>>> print(output.shape)
(32, 4)
>>> lstm = itex.ops.ItexLSTM(4, return_sequences=True, return_state=True)
>>> whole_seq_output, final_memory_state, final_carry_state = lstm(inputs)
>>> print(whole_seq_output.shape)
(32, 10, 4)
>>> print(final_memory_state.shape)
(32, 4)
>>> print(final_carry_state.shape)
(32, 4)
```