Customized Operators¶

Public API for extended XPU operators is provided by the itex.ops namespace. The extended API provides better performance than the original public API.

`itex.ops.AdamWithWeightDecayOptimizer`¶

This optimizer implements the Adam algorithm with weight decay.

itex.ops.AdamWithWeightDecayOptimizer(
    weight_decay_rate=0.001, learning_rate=0.001, beta_1=0.9, beta_2=0.999,
    epsilon=1e-07, name='Adam',
    exclude_from_weight_decay=["LayerNorm", "layer_norm", "bias"], **kwargs
)

This is an implementation of the AdamW optimizer described in “Decoupled Weight Decay Regularization” by Loshch ilov & Hutter (pdf). This Python API itex.ops.AdamWithWeightDecayOptimizer replaces tfa.optimizers.AdamW.

For example:

step = tf.Variable(0, trainable=False)
schedule = tf.optimizers.schedules.PiecewiseConstantDecay(
    [10000, 15000], [1e-0, 1e-1, 1e-2])
# lr and wd can be a function or a tensor
lr = 1e-1 * schedule(step)
wd = lambda: 1e-4 * schedule(step)

# ...

optimizer = itex.ops.AdamWithWeightDecayOptimizer(learning_rate=lr, weight_decay=wd)

`itex.ops.LayerNormalization`¶

Layer normalization layer (Ba et al., 2016).

itex.ops.LayerNormalization(
    axis=-1, epsilon=0.001, center=True, scale=True,
    beta_initializer='zeros', gamma_initializer='ones',
    beta_regularizer=None, gamma_regularizer=None, beta_constraint=None,
    gamma_constraint=None, **kwargs
)

Normalize the activations of the previous layer for each given example in a batch independently, rather than across a batch like Batch Normalization. This applies a transformation that maintains the mean activation within each example close to 0, and the activation standard deviation close to 1. This python API itex.ops.LayerNormalization replaces tf.keras.layers.LayerNormalization.

For example:

>>> import intel_extension_for_tensorflow as itex
>>> data = tf.constant(np.arange(10).reshape(5, 2) * 10, dtype=tf.float32)
>>> layer = itex.ops.LayerNormalization(axis=1)
>>> output = layer(data, training=False)
>>> print(output)
tf.Tensor(
[[-0.99998  0.99998]
 [-0.99998  0.99998]
 [-0.99998  0.99998]
 [-0.99998  0.99998]
 [-0.99998  0.99998]], shape=(5, 2), dtype=float32)

`itex.ops.gelu`¶

Applies the Gaussian error linear unit (GELU) activation function.

itex.ops.gelu(
    features, approximate=False, name=None
)

Gaussian error linear unit (GELU) computes x * P(X <= x), where P(X) ~ N(0, 1). The (GELU) nonlinearity weights inputs by their value, rather than gating inputs by their sign as in ReLU. This Python API itex.ops.gelu replaces tf.nn.gelu.

For example:

>>> import intel_extension_for_tensorflow as itex
>>> x = tf.constant([-3.0, -1.0, 0.0, 1.0, 3.0], dtype=tf.float32)
>>> y = itex.ops.gelu(x)
>>> y.numpy()
array([-0.00404969, -0.15865526,  0.        ,  0.8413447 ,  2.9959502 ],
      dtype=float32)
>>> y = itex.ops.gelu(x, approximate=True)
>>> y.numpy()
array([-0.00363725, -0.158808  ,  0.        ,  0.841192  ,  2.9963627 ],
      dtype=float32)

`itex.ops.ItexLSTM`¶

Long Short-Term Memory layer (first proposed in Hochreiter & Schmidhuber, 1997), this python API itex.ops.ItexLSTM is semantically the same as tf.keras.layers.LSTM.

itex.ops.ItexLSTM(
    200, activation='tanh',
    recurrent_activation='sigmoid',
    use_bias=True,
    kernel_initializer='glorot_uniform',
    recurrent_initializer='orthogonal',
    bias_initializer='zeros', **kwargs
)

Based on available runtime hardware and constraints, this layer will choose different implementations (ITEX-based or fallback-TensorFlow) to maximize the performance.
If a GPU is available and all the arguments to the layer meet the requirements of the ITEX kernel (see below for details), the layer will use a fast Intel® Extension for TensorFlow* implementation. The requirements to use the ITEX implementation are:

activation == tanh
recurrent_activation == sigmoid
use_bias is True
Inputs, if use masking, are strictly right-padded.
Eager execution is enabled in the outermost context.

For example:

>>> import intel_extension_for_tensorflow as itex
>>> inputs = tf.random.normal([32, 10, 8])
>>> lstm = itex.ops.ItexLSTM(4)
>>> output = lstm(inputs)
>>> print(output.shape)
(32, 4)
>>> lstm = itex.ops.ItexLSTM(4, return_sequences=True, return_state=True)
>>> whole_seq_output, final_memory_state, final_carry_state = lstm(inputs)
>>> print(whole_seq_output.shape)
(32, 10, 4)
>>> print(final_memory_state.shape)
(32, 4)
>>> print(final_carry_state.shape)
(32, 4)

Customized Operators¶

itex.ops.AdamWithWeightDecayOptimizer¶

itex.ops.LayerNormalization¶

itex.ops.gelu¶

itex.ops.ItexLSTM¶

`itex.ops.AdamWithWeightDecayOptimizer`¶

`itex.ops.LayerNormalization`¶

`itex.ops.gelu`¶

`itex.ops.ItexLSTM`¶