Optimization Orchestration
Introduction
Orchestration is the combination of multiple optimization techniques, either applied simultaneously (one-shot). Intel Neural Compressor supports arbitrary meaningful combinations of supported optimization methods under one-shot, such as pruning during quantization-aware training.
One-shot
Since quantization-aware training, pruning and distillation all leverage training process for optimization, we can achieve the goal of optimization through one shot training with arbitrary meaningful combinations of these methods, which often gain more benefits in terms of performance and accuracy than just one compression technique applied, and usually are as efficient as applying just one compression technique. The three possible combinations are shown below.
Pruning during quantization-aware training
Distillation with pattern lock pruning
Distillation with pattern lock pruning and quantization-aware training
Orchestration Support Matrix
Orchestration | Combinations | Supported |
---|---|---|
One-shot | Pruning + Quantization Aware Training | ✔ |
Distillation + Quantization Aware Training | ✔ | |
Distillation + Pruning | ✔ | |
Distillation + Pruning + Quantization Aware Training | ✔ |
Get Started with Orchestration API
Neural Compressor defines Scheduler
class to automatically pipeline execute model optimization with one shot way.
User instantiates model optimization components, such as quantization, pruning, distillation, separately. After that, user could append those separate optimization objects into scheduler’s pipeline, the scheduler API executes them one by one.
In following example it execute the distillation and pruning with one-shot way, the code is like below.
from neural_compressor.training import prepare_compression
from neural_compressor.config import DistillationConfig, KnowledgeDistillationLossConfig, WeightPruningConfig
distillation_criterion = KnowledgeDistillationLossConfig()
d_conf = DistillationConfig(model, distillation_criterion)
p_conf = WeightPruningConfig()
compression_manager = prepare_compression(model=model, confs=[d_conf, p_conf])
compression_manager.callbacks.on_train_begin()
train_loop:
compression_manager.on_train_begin()
for epoch in range(epochs):
compression_manager.on_epoch_begin(epoch)
for i, batch in enumerate(dataloader):
compression_manager.on_step_begin(i)
......
output = model(batch)
loss = ......
loss = compression_manager.on_after_compute_loss(batch, output, loss)
loss.backward()
compression_manager.on_before_optimizer_step()
optimizer.step()
compression_manager.on_step_end()
compression_manager.on_epoch_end()
compression_manager.on_train_end()
model.save('./path/to/save')