Quick Start with ptiView API#

Warning

DRAFT DOCUMENTATION - This documentation is currently in draft status and subject to change.

This guide will help you get started with PTI SDK. By the end, you’ll be able to trace GPU kernels and understand the basic API workflow.

Prerequisites#

Before starting, ensure PTI SDK is installed. See Building and Installing for installation instructions.

Environment Setup#

Set up the Intel(R) oneAPI environment:

Linux:

source <path_to_oneapi>/setvars.sh

Windows:

Open the Intel(R) oneAPI Command Prompt.

Basic Usage Pattern#

The ptiView API follows this pattern:

Define callbacks for buffer allocation and data processing
Register callbacks with ptiViewSetCallbacks()
Enable tracing with ptiViewEnable()
Run your application
Disable tracing with ptiViewDisable()
Flush remaining data with ptiFlushAllViews()

Running Your First Sample#

Let’s run the vector square-add sample to see PTI SDK in action.

Step 1: Navigate to the build directory

cd build

Step 2: Run the sample

Linux:

./bin/vec_sqadd

Windows:

bin\vec_sqadd.exe

Understanding the Output#

The sample will display:

Device Information: GPU device being used
Kernel Traces: Information about GPU kernels executed
Memory Operations: Data transfers between host and device
Timing Information: Timestamps and durations

Example output:

>>>> [123456789] zeKernelCreate: zeKernel = 0xdeadbeef
>>>> [123456790] zeCommandListAppendLaunchKernel
Kernel: VecSq, Duration: 1234 ns
>>>> [123456800] zeCommandListAppendMemoryCopy
Memory Copy: Host->Device, Size: 20000 bytes, Duration: 567 ns

Basic API Example#

Here’s a minimal example showing the PTI SDK API usage:

#include "pti/pti_view.h"
#include <iostream>

// Callback to allocate buffer
void BufferRequested(unsigned char** buf, size_t* buf_size) {
    *buf_size = 1024 * 1024;  // 1 MB
    *buf = new unsigned char[*buf_size];
}

// Callback to process collected data
void BufferCompleted(unsigned char* buf, size_t buf_size, size_t valid_buf_size) {
    if (!buf || !valid_buf_size) return;

    pti_view_record_base* ptr = nullptr;
    while (true) {
        auto status = ptiViewGetNextRecord(buf, valid_buf_size, &ptr);
        if (status == PTI_STATUS_END_OF_BUFFER) break;
        if (status != PTI_SUCCESS) {
            std::cerr << "Error reading record" << std::endl;
            break;
        }

        if (ptr->_view_kind == PTI_VIEW_DEVICE_GPU_KERNEL) {
            auto* kernel = reinterpret_cast<pti_view_record_kernel*>(ptr);
            std::cout << "Kernel: " << kernel->_name
                      << ", Duration: " << (kernel->_end_timestamp - kernel->_start_timestamp)
                      << " ns" << std::endl;
        }
    }
    delete[] buf;
}

int main() {
    // 1. Register callbacks for buffer allocation and data processing
    ptiViewSetCallbacks(BufferRequested, BufferCompleted);

    // 2. Enable tracing for kernels
    ptiViewEnable(PTI_VIEW_DEVICE_GPU_KERNEL);

    // 3. Run your SYCL/Level-Zero application
    // ... your GPU workload here ...

    // 4. Disable tracing
    ptiViewDisable(PTI_VIEW_DEVICE_GPU_KERNEL);

    // 5. Flush any remaining buffered data
    ptiFlushAllViews();

    return 0;
}

Available Tracing Views#

PTI SDK supports tracing different types of activities:

Device Operations:

PTI_VIEW_DEVICE_GPU_KERNEL - GPU kernel execution on the device
PTI_VIEW_DEVICE_GPU_MEM_COPY - Memory copy operations between host and device
PTI_VIEW_DEVICE_GPU_MEM_FILL - Memory fill operations on the device
PTI_VIEW_DEVICE_GPU_MEM_COPY_P2P - Peer-to-peer memory copies between devices
PTI_VIEW_DEVICE_SYNCHRONIZATION - Synchronization operations on host and GPU (barriers, fences, events)

API Tracing:

PTI_VIEW_RUNTIME_API - Runtime API calls (SYCL)
PTI_VIEW_DRIVER_API - Driver/back-end API calls (Level-Zero)

Profiling Support:

PTI_VIEW_EXTERNAL_CORRELATION - Application-level correlation IDs for connecting GPU activities with user annotations
PTI_VIEW_COLLECTION_OVERHEAD - Profiling overhead tracking

Communication:

PTI_VIEW_COMMUNICATION - Communication operations via Intel® oneCCL (Linux only)

You can enable multiple views simultaneously:

ptiViewEnable(PTI_VIEW_DEVICE_GPU_KERNEL);
ptiViewEnable(PTI_VIEW_DEVICE_GPU_MEM_COPY);
ptiViewEnable(PTI_VIEW_RUNTIME_API);

Running Tests#

To verify your PTI SDK installation, run the test suite:

Linux:

cd build
make test

Or with CTest for detailed output:

ctest --output-on-failure

Windows:

cd build
ninja test

Or with CTest:

ctest --output-on-failure

On-Demand Collection#

PTI SDK supports on-demand profiling for zero overhead outside the profiled regions. You can enable and disable tracing around specific code sections to focus on areas of interest:

// Application initialization
setup_callbacks();

// No overhead here - tracing is not enabled
some_work();

// Start tracing - this is where collection begins
ptiViewEnable(PTI_VIEW_DEVICE_GPU_KERNEL);

// Trace this section
important_work();

// Stop tracing - zero overhead resumes
ptiViewDisable(PTI_VIEW_DEVICE_GPU_KERNEL);

// Flush remaining buffered data
ptiFlushAllViews();

// No overhead here again
more_work();

Next Steps#

Now that you’ve run your first sample, explore more:

Examine the samples in samples/ directory:
- vector_sq_add - Basic tracing
- dpc_gemm - GEMM with performance tracking
- callback - Advanced callback usage
- metrics_scope - Hardware metrics collection
Check the API Reference for detailed documentation:
- PTI View API Reference - Complete PTI View API for tracing
- PTI Metrics Scope API Reference - Per-kernel hardware metrics collection
- PTI Metrics API Reference - Device-level metrics collection
- PTI Callback API Reference (Experimental) - Advanced callback patterns
- Function signatures
- Data structure definitions

Browse sample code at samples/ for real-world usage patterns

Troubleshooting#

No output from sample:

Ensure oneAPI environment is set up (setvars.sh)
Verify GPU drivers are installed
Check that Level-Zero loader is available

Callbacks not called:

Ensure ptiViewSetCallbacks() is called before ptiViewEnable()
Verify callbacks are properly registered
Check that you’re enabling the correct view types

Build errors:

See the Building and Installing guide for build requirements
Ensure C++17 support is available

Performance issues:

Use on-demand collection (ptiViewEnable/ptiViewDisable) to reduce overhead
Consider reducing callback complexity
Profile only regions of interest

For more help, see the GitHub repository or submit an issue.

Quick Start with ptiView API

Contents

Quick Start with ptiView API#

Prerequisites#

Environment Setup#

Basic Usage Pattern#

Running Your First Sample#

Understanding the Output#

Basic API Example#

Available Tracing Views#

Running Tests#

On-Demand Collection#

Next Steps#

Troubleshooting#