Optimization Orchestration

  1. Introduction

    1.1. One-shot

  2. Orchestration Support Matrix

  3. Get Started with Orchestration API

  4. Examples

Introduction

Orchestration is the combination of multiple optimization techniques, either applied simultaneously (one-shot). Intel Neural Compressor supports arbitrary meaningful combinations of supported optimization methods under one-shot, such as pruning during quantization-aware training.

One-shot

Since quantization-aware training, pruning and distillation all leverage training process for optimization, we can achieve the goal of optimization through one shot training with arbitrary meaningful combinations of these methods, which often gain more benefits in terms of performance and accuracy than just one compression technique applied, and usually are as efficient as applying just one compression technique. The three possible combinations are shown below.

  • Pruning during quantization-aware training

  • Distillation with pattern lock pruning

  • Distillation with pattern lock pruning and quantization-aware training

Orchestration Support Matrix

Orchestration Combinations Supported
One-shot Pruning + Quantization Aware Training
Distillation + Quantization Aware Training
Distillation + Pruning
Distillation + Pruning + Quantization Aware Training

Get Started with Orchestration API

Neural Compressor defines Scheduler class to automatically pipeline execute model optimization with one shot way.

User instantiates model optimization components, such as quantization, pruning, distillation, separately. After that, user could append those separate optimization objects into scheduler’s pipeline, the scheduler API executes them one by one.

In following example it execute the distillation and pruning with one-shot way, the code is like below.

from neural_compressor.training import prepare_compression
from neural_compressor.config import DistillationConfig, KnowledgeDistillationLossConfig, WeightPruningConfig
distillation_criterion = KnowledgeDistillationLossConfig()
d_conf = DistillationConfig(model, distillation_criterion)
p_conf = WeightPruningConfig()
compression_manager = prepare_compression(model=model, confs=[d_conf, p_conf])

compression_manager.callbacks.on_train_begin()
train_loop:
    compression_manager.on_train_begin()
    for epoch in range(epochs):
        compression_manager.on_epoch_begin(epoch)
        for i, batch in enumerate(dataloader):
            compression_manager.on_step_begin(i)
            ......
            output = model(batch)
            loss = ......
            loss = compression_manager.on_after_compute_loss(batch, output, loss)
            loss.backward()
            compression_manager.on_before_optimizer_step()
            optimizer.step()
            compression_manager.on_step_end()
        compression_manager.on_epoch_end()
    compression_manager.on_train_end()
    
model.save('./path/to/save')

Examples

Orchestration Examples