# Quick Start

The following instructions assume you have installed the Intel® Extension for PyTorch\*. For installation instructions, refer to [Installation](../../../index.html#installation?platform=cpu&version=main).

To start using the Intel® Extension for PyTorch\* in your code, you need to make the following changes:

1. Import the extension with `import intel_extension_for_pytorch as ipex`.
2. Invoke the `optimize()` function to apply optimizations.
3. Convert the eager mode model to a graph mode model.
    - For TorchScript, invoke `torch.jit.trace()` and `torch.jit.freeze()`
    - For TorchDynamo, invoke `torch.compile(model, backend="ipex")`(*Beta feature*)

**Important:** It is highly recommended to `import intel_extension_for_pytorch` right after `import torch`, prior to importing other packages.

The example below demostrates how to use the Intel® Extension for PyTorch\* with TorchScript:

```python
import torch
############## import ipex ###############
import intel_extension_for_pytorch as ipex
##########################################

model = Model()
model.eval()
data = ...

############## TorchScript ###############
model = ipex.optimize(model, dtype=torch.bfloat16)

with torch.no_grad(), torch.cpu.amp.autocast():
  model = torch.jit.trace(model, data)
  model = torch.jit.freeze(model)
  model(data)
##########################################
```

The example below demostrates how to use the Intel® Extension for PyTorch\* with TorchDynamo:

```python
import torch
############## import ipex ###############
import intel_extension_for_pytorch as ipex
##########################################

model = Model()
model.eval()
data = ...

############## TorchDynamo ###############
model = ipex.optimize(model, weights_prepack=False)

model = torch.compile(model, backend="ipex")
with torch.no_grad():
  model(data)
##########################################
```

More examples, including training and usage of low precision data types are available in the [Examples](./examples.md) section.

In [Cheat Sheet](./cheat_sheet.md), you can find more commands that can help you start using the Intel® Extension for PyTorch\*.


## LLM Quick Start

`ipex.llm.optimize` is used for Large Language Models (LLM).


```python
import torch
#################### code changes ####################  
import intel_extension_for_pytorch as ipex
######################################################  
import argparse
from transformers import (
    AutoConfig,
    AutoModelForCausalLM,
    AutoTokenizer,
)

# args
parser = argparse.ArgumentParser("Generation script (fp32/bf16 path)", add_help=False)
parser.add_argument(
    "--dtype",
    type=str,
    choices=["float32", "bfloat16"],
    default="float32",
    help="choose the weight dtype and whether to enable auto mixed precision or not",
)
parser.add_argument(
    "--max-new-tokens", default=32, type=int, help="output max new tokens"
)
parser.add_argument(
    "--prompt", default="What are we having for dinner?", type=str, help="input prompt"
)
parser.add_argument("--greedy", action="store_true")
parser.add_argument("--batch-size", default=1, type=int, help="batch size")
args = parser.parse_args()
print(args)

# dtype
amp_enabled = True if args.dtype != "float32" else False
amp_dtype = getattr(torch, args.dtype)

# load model
model_id = MODEL_ID
config = AutoConfig.from_pretrained(
    model_id, torchscript=True, trust_remote_code=True
)
model = AutoModelForCausalLM.from_pretrained(
    model_id,
    torch_dtype=amp_dtype,
    config=config,
    low_cpu_mem_usage=True,
    trust_remote_code=True,
)
tokenizer = AutoTokenizer.from_pretrained(
    model_id,
    trust_remote_code=True
)
model = model.eval()
model = model.to(memory_format=torch.channels_last)

# Intel(R) Extension for PyTorch*
#################### code changes ####################  # noqa F401
model = ipex.llm.optimize(
    model,
    dtype=amp_dtype,
    inplace=True,
    deployment_mode=True,
)
######################################################  # noqa F401

# generate args
num_beams = 1 if args.greedy else 4
generate_kwargs = dict(do_sample=False, temperature=0.9, num_beams=num_beams)

# input prompt
prompt = args.prompt
input_size = tokenizer(prompt, return_tensors="pt").input_ids.size(dim=1)
print("---- Prompt size:", input_size)
prompt = [prompt] * args.batch_size

# inference
with torch.inference_mode(), torch.cpu.amp.autocast(enabled=amp_enabled):
    input_ids = tokenizer(prompt, return_tensors="pt").input_ids
    gen_ids = model.generate(
        input_ids,
        max_new_tokens=args.max_new_tokens,
        **generate_kwargs
    )
    gen_text = tokenizer.batch_decode(gen_ids, skip_special_tokens=True)
    input_tokens_lengths = [x.shape[0] for x in input_ids]
    output_tokens_lengths = [x.shape[0] for x in gen_ids]
    total_new_tokens = [
        o - i for i, o in zip(input_tokens_lengths, output_tokens_lengths)
    ]
    print(gen_text, total_new_tokens, flush=True)
```

More LLM examples, including usage of low precision data types are available in the [LLM Examples](https://github.com/intel/intel-extension-for-pytorch/tree/main/examples/cpu/inference/python/llm) section.