Quick Start: AMX with TensorFlow
Follow these steps to set up TensorFlow with Intel AMX acceleration (BF16 / FP16).
1. (Recommended) Create and activate a virtual environment
python3 -m venv .venv
source .venv/bin/activate
python -m pip install --upgrade pip
2. Install TensorFlow
pip install --upgrade tensorflow
For platform-specific guidance, see: https://www.tensorflow.org/install
3. Verify the installation
python -c "import tensorflow as tf; print(tf.reduce_sum(tf.random.normal([1000, 1000])))"
Sample: Matrix Multiplication with AMX
The example below benchmarks BF16 matrix multiplication.
Notes:
- BF16 mixed precision: broadly supported on AMX‑enabled Intel CPUs.
- FP16 mixed precision: supported only on Intel® Xeon® 6 processors.
import os
import time
import tensorflow as tf
# (Optional) Verbose oneDNN logging for verification
# os.environ["ONEDNN_VERBOSE"] = "1"
# Configuration (adjust as needed)
BATCH_SIZE = 32
INPUT_FEATURES = 4096
HIDDEN_FEATURES = 4096
NUM_ITERATIONS = 10
# Initialize random tensors (BF16)
A = tf.random.uniform([BATCH_SIZE, INPUT_FEATURES], dtype=tf.bfloat16)
B = tf.random.uniform([INPUT_FEATURES, HIDDEN_FEATURES], dtype=tf.bfloat16)
# Benchmark loop
total_time = 0.0
for i in range(NUM_ITERATIONS):
start = time.time()
C = tf.matmul(A, B) # utilizes AMX via oneDNN optimized kernels
elapsed = time.time() - start
total_time += elapsed
print(f"Iteration {i+1}: {elapsed:.6f} seconds")
print(f"Average execution time: {total_time / NUM_ITERATIONS:.6f} seconds")
Verifying AMX Utilization
Enable oneDNN verbose output before running the script:
export ONEDNN_VERBOSE=1
python your_script.py
Or set inside the script:
os.environ["ONEDNN_VERBOSE"] = "1"
Look for lines indicating AMX usage (example excerpt):
onednn_verbose,v1,info,oneDNN v3.7.3 (commit N/A)
onednn_verbose,v1,info,cpu,isa:Intel AVX-512 with float16, Intel DL Boost and bfloat16 support and Intel AMX with bfloat16 and 8-bit integer support
onednn_verbose,v1,primitive,exec,cpu,matmul,brg_matmul:avx10_1_512_amx,...
Iteration 1: 0.012289 seconds
...
Average execution time: 0.001941 seconds
Key indicator: brg_matmul:avx10_1_512_amx (or similar) confirming AMX-backed BF16 matmul.
Troubleshooting
- Missing AMX indicators: Ensure CPU supports AMX and kernel/OS has AMX enabled (XSAVE/XFD configured).
- Slow first iteration: Expected due to cache warmup.
- To test FP16 (Xeon 6 onwards): Replace
tf.bfloat16dtypes withtf.float16.
This completes the setup and verification workflow.
Leveraging AMX for different AI Use-Cases
For guidance on how to enable AMX for different AI use-cases, follow the links below:
| Use Case | Model | Description | README |
|---|---|---|---|
| Natural Language Processing | BERT-Large (Uncased) | Sequence classification | Link |
| Graph Neural Networks | R-GAT | Relational graph attention network inference | Link |
| Computer Vision | ResNet50 v1.5 | Image classification (CNN) | Link |