# Customized Operators Public API for extended XPU operators is provided by the `itex.ops` namespace. The extended API provides better performance than the original public API. ## `itex.ops.AdamWithWeightDecayOptimizer` This optimizer implements the Adam algorithm with weight decay. ```python itex.ops.AdamWithWeightDecayOptimizer( weight_decay=0.001, learning_rate=0.001, beta_1=0.9, beta_2=0.999, epsilon=1e-07, name='Adam', **kwargs ) ``` This is an implementation of the AdamW optimizer described in "Decoupled Weight Decay Regularization" by Loshch ilov & Hutter ([pdf](https://arxiv.org/abs/1711.05101)). This Python API `itex.ops.AdamWithWeightDecayOptimizer` replaces [tfa.optimizers.AdamW](https://www.tensorflow.org/addons/api_docs/python/tfa/optimizers/AdamW). For example: ```python step = tf.Variable(0, trainable=False) schedule = tf.optimizers.schedules.PiecewiseConstantDecay( [10000, 15000], [1e-0, 1e-1, 1e-2]) # lr and wd can be a function or a tensor lr = 1e-1 * schedule(step) wd = lambda: 1e-4 * schedule(step) # ... optimizer = itex.ops.AdamWithWeightDecayOptimizer(learning_rate=lr, weight_decay=wd) ``` ## `itex.ops.LAMBOptimizer` This optimizer implements the LAMB algorithm. ```python itex.ops.LAMBOptimizer( learning_rate=0.001, beta_1=0.9, beta_2=0.999, epsilon=1e-06, weight_decay=0.001, name='LAMB', **kwargs ) ``` This is an implementation of the LAMB optimizer described in "Large Batch Optimization for Deep Learning: Training BERT in 76 minutes" ([pdf](https://arxiv.org/abs/1904.00962)). This Python API `itex.ops.LAMBOptimizer` replaces [tfa.optimizers.LAMB](https://www.tensorflow.org/addons/api_docs/python/tfa/optimizers/LAMB). For example: ```python step = tf.Variable(0, trainable=False) schedule = tf.optimizers.schedules.PiecewiseConstantDecay( [10000, 15000], [1e-0, 1e-1, 1e-2]) # lr and wd can be a function or a tensor lr = 1e-1 * schedule(step) wd = lambda: 1e-4 * schedule(step) # ... optimizer = itex.ops.LAMBOptimizer(learning_rate=lr, weight_decay=wd) ``` ## `itex.ops.LayerNormalization` [Layer normalization layer (Ba et al., 2016)](https://arxiv.org/abs/1607.06450). ```python itex.ops.LayerNormalization( axis=-1, epsilon=0.001, center=True, scale=True, beta_initializer='zeros', gamma_initializer='ones', beta_regularizer=None, gamma_regularizer=None, beta_constraint=None, gamma_constraint=None, **kwargs ) ``` Normalize the activations of the previous layer for each given example in a batch independently, rather than across a batch like Batch Normalization. This applies a transformation that maintains the mean activation within each example close to 0, and the activation standard deviation close to 1. This python API `itex.ops.LayerNormalization` replaces [tf.keras.layers.LayerNormalization](https://www.tensorflow.org/api_docs/python/tf/keras/layers/LayerNormalization). For example: ```sh >>> import intel_extension_for_tensorflow as itex >>> data = tf.constant(np.arange(10).reshape(5, 2) * 10, dtype=tf.float32) >>> layer = itex.ops.LayerNormalization(axis=1) >>> output = layer(data, training=False) >>> print(output) tf.Tensor( [[-0.99998 0.99998] [-0.99998 0.99998] [-0.99998 0.99998] [-0.99998 0.99998] [-0.99998 0.99998]], shape=(5, 2), dtype=float32) ``` ## `itex.ops.GroupNormalization` [Group normalization layer (Yuxin Wu, Kaiming He)](https://arxiv.org/abs/1803.08494). ```python itex.ops.GroupNormalization( groups=32, axis=-1, epsilon=1e-3, center=True, scale=True, beta_initializer="zeros", gamma_initializer="ones", beta_regularizer=None, gamma_regularizer=None, beta_constraint=None, gamma_constraint=None, **kwargs ) ``` Group Normalization divides the channels into groups and computes within each group the mean and variance for normalization. Empirically, its accuracy is more stable than batch norm in a wide range of small batch sizes, if learning rate is adjusted linearly with batch sizes. This python API `itex.ops.GroupNormalization` replaces [tf.keras.layers.GroupNormalization](https://www.tensorflow.org/api_docs/python/tf/keras/layers/GroupNormalization). Note, ITEX provide faster GPU implement for 4d input and axis=-1, others is same as original keras. For example: ```sh >>> import tensorflow as tf >>> import intel_extension_for_tensorflow as itex >>> data = tf.random.normal((1, 8, 8, 32)) >>> layer = itex.ops.GroupNormalization(axis=-1) >>> output = layer(data) ``` ## `itex.ops.gelu` Applies the Gaussian error linear unit (GELU) activation function. ```python itex.ops.gelu( features, approximate=False, name=None ) ``` Gaussian error linear unit (`GELU`) computes `x * P(X <= x)`, where `P(X) ~ N(0, 1)`. The (GELU) nonlinearity weights inputs by their value, rather than gating inputs by their sign as in `ReLU`. This Python API `itex.ops.gelu` replaces [tf.nn.gelu](https://www.tensorflow.org/api_docs/python/tf/nn/gelu). For example: ```sh >>> import intel_extension_for_tensorflow as itex >>> x = tf.constant([-3.0, -1.0, 0.0, 1.0, 3.0], dtype=tf.float32) >>> y = itex.ops.gelu(x) >>> y.numpy() array([-0.00404969, -0.15865526, 0. , 0.8413447 , 2.9959502 ], dtype=float32) >>> y = itex.ops.gelu(x, approximate=True) >>> y.numpy() array([-0.00363725, -0.158808 , 0. , 0.841192 , 2.9963627 ], dtype=float32) ``` ## `itex.ops.ItexLSTM` Long Short-Term Memory layer (first proposed in Hochreiter & Schmidhuber, 1997), this python API `itex.ops.ItexLSTM` is semantically the same as [tf.keras.layers.LSTM](https://www.tensorflow.org/api_docs/python/tf/keras/layers/LSTM). ```python itex.ops.ItexLSTM( 200, activation='tanh', recurrent_activation='sigmoid', use_bias=True, kernel_initializer='glorot_uniform', recurrent_initializer='orthogonal', bias_initializer='zeros', **kwargs ) ``` Based on available runtime hardware and constraints, this layer will choose different implementations (ITEX-based or fallback-TensorFlow) to maximize the performance. If a GPU is available and all the arguments to the layer meet the requirements of the ITEX kernel (see below for details), the layer will use a fast IntelĀ® Extension for TensorFlow* implementation. The requirements to use the ITEX implementation are: 1. `activation` == `tanh` 2. `recurrent_activation` == `sigmoid` 3. `use_bias` is `True` 4. Inputs, if use masking, are strictly right-padded. 5. Eager execution is enabled in the outermost context. For example: ```sh >>> import intel_extension_for_tensorflow as itex >>> inputs = tf.random.normal([32, 10, 8]) >>> lstm = itex.ops.ItexLSTM(4) >>> output = lstm(inputs) >>> print(output.shape) (32, 4) >>> lstm = itex.ops.ItexLSTM(4, return_sequences=True, return_state=True) >>> whole_seq_output, final_memory_state, final_carry_state = lstm(inputs) >>> print(whole_seq_output.shape) (32, 10, 4) >>> print(final_memory_state.shape) (32, 4) >>> print(final_carry_state.shape) (32, 4) ```