Quick Start with ptiView API#
Warning
DRAFT DOCUMENTATION - This documentation is currently in draft status and subject to change.
This guide will help you get started with PTI SDK. By the end, you’ll be able to trace GPU kernels and understand the basic API workflow.
Prerequisites#
Before starting, ensure PTI SDK is installed. See Building and Installing for installation instructions.
Environment Setup#
Set up the Intel(R) oneAPI environment:
Linux:
source <path_to_oneapi>/setvars.sh
Windows:
Open the Intel(R) oneAPI Command Prompt.
Basic Usage Pattern#
The ptiView API follows this pattern:
Define callbacks for buffer allocation and data processing
Register callbacks with
ptiViewSetCallbacks()Enable tracing with
ptiViewEnable()Run your application
Disable tracing with
ptiViewDisable()Flush remaining data with
ptiFlushAllViews()
Running Your First Sample#
Let’s run the vector square-add sample to see PTI SDK in action.
Step 1: Navigate to the build directory
cd build
Step 2: Run the sample
Linux:
./bin/vec_sqadd
Windows:
bin\vec_sqadd.exe
Understanding the Output#
The sample will display:
Device Information: GPU device being used
Kernel Traces: Information about GPU kernels executed
Memory Operations: Data transfers between host and device
Timing Information: Timestamps and durations
Example output:
>>>> [123456789] zeKernelCreate: zeKernel = 0xdeadbeef
>>>> [123456790] zeCommandListAppendLaunchKernel
Kernel: VecSq, Duration: 1234 ns
>>>> [123456800] zeCommandListAppendMemoryCopy
Memory Copy: Host->Device, Size: 20000 bytes, Duration: 567 ns
Basic API Example#
Here’s a minimal example showing the PTI SDK API usage:
#include "pti/pti_view.h"
#include <iostream>
// Callback to allocate buffer
void BufferRequested(unsigned char** buf, size_t* buf_size) {
*buf_size = 1024 * 1024; // 1 MB
*buf = new unsigned char[*buf_size];
}
// Callback to process collected data
void BufferCompleted(unsigned char* buf, size_t buf_size, size_t valid_buf_size) {
if (!buf || !valid_buf_size) return;
pti_view_record_base* ptr = nullptr;
while (true) {
auto status = ptiViewGetNextRecord(buf, valid_buf_size, &ptr);
if (status == PTI_STATUS_END_OF_BUFFER) break;
if (status != PTI_SUCCESS) {
std::cerr << "Error reading record" << std::endl;
break;
}
if (ptr->_view_kind == PTI_VIEW_DEVICE_GPU_KERNEL) {
auto* kernel = reinterpret_cast<pti_view_record_kernel*>(ptr);
std::cout << "Kernel: " << kernel->_name
<< ", Duration: " << (kernel->_end_timestamp - kernel->_start_timestamp)
<< " ns" << std::endl;
}
}
delete[] buf;
}
int main() {
// 1. Register callbacks for buffer allocation and data processing
ptiViewSetCallbacks(BufferRequested, BufferCompleted);
// 2. Enable tracing for kernels
ptiViewEnable(PTI_VIEW_DEVICE_GPU_KERNEL);
// 3. Run your SYCL/Level-Zero application
// ... your GPU workload here ...
// 4. Disable tracing
ptiViewDisable(PTI_VIEW_DEVICE_GPU_KERNEL);
// 5. Flush any remaining buffered data
ptiFlushAllViews();
return 0;
}
Available Tracing Views#
PTI SDK supports tracing different types of activities:
Device Operations:
PTI_VIEW_DEVICE_GPU_KERNEL- GPU kernel execution on the devicePTI_VIEW_DEVICE_GPU_MEM_COPY- Memory copy operations between host and devicePTI_VIEW_DEVICE_GPU_MEM_FILL- Memory fill operations on the devicePTI_VIEW_DEVICE_GPU_MEM_COPY_P2P- Peer-to-peer memory copies between devicesPTI_VIEW_DEVICE_SYNCHRONIZATION- Synchronization operations on host and GPU (barriers, fences, events)
API Tracing:
PTI_VIEW_RUNTIME_API- Runtime API calls (SYCL)PTI_VIEW_DRIVER_API- Driver/back-end API calls (Level-Zero)
Profiling Support:
PTI_VIEW_EXTERNAL_CORRELATION- Application-level correlation IDs for connecting GPU activities with user annotationsPTI_VIEW_COLLECTION_OVERHEAD- Profiling overhead tracking
Communication:
PTI_VIEW_COMMUNICATION- Communication operations via Intel® oneCCL (Linux only)
You can enable multiple views simultaneously:
ptiViewEnable(PTI_VIEW_DEVICE_GPU_KERNEL);
ptiViewEnable(PTI_VIEW_DEVICE_GPU_MEM_COPY);
ptiViewEnable(PTI_VIEW_RUNTIME_API);
Running Tests#
To verify your PTI SDK installation, run the test suite:
Linux:
cd build
make test
Or with CTest for detailed output:
ctest --output-on-failure
Windows:
cd build
ninja test
Or with CTest:
ctest --output-on-failure
On-Demand Collection#
PTI SDK supports on-demand profiling for zero overhead outside the profiled regions. You can enable and disable tracing around specific code sections to focus on areas of interest:
// Application initialization
setup_callbacks();
// No overhead here - tracing is not enabled
some_work();
// Start tracing - this is where collection begins
ptiViewEnable(PTI_VIEW_DEVICE_GPU_KERNEL);
// Trace this section
important_work();
// Stop tracing - zero overhead resumes
ptiViewDisable(PTI_VIEW_DEVICE_GPU_KERNEL);
// Flush remaining buffered data
ptiFlushAllViews();
// No overhead here again
more_work();
Next Steps#
Now that you’ve run your first sample, explore more:
Examine the samples in
samples/directory:vector_sq_add- Basic tracingdpc_gemm- GEMM with performance trackingcallback- Advanced callback usagemetrics_scope- Hardware metrics collection
Check the API Reference for detailed documentation:
PTI View API Reference - Complete PTI View API for tracing
PTI Metrics Scope API Reference - Per-kernel hardware metrics collection
PTI Metrics API Reference - Device-level metrics collection
PTI Callback API Reference (Experimental) - Advanced callback patterns
Function signatures
Data structure definitions
Browse sample code at
samples/for real-world usage patterns
Troubleshooting#
- No output from sample:
Ensure oneAPI environment is set up (
setvars.sh)Verify GPU drivers are installed
Check that Level-Zero loader is available
- Callbacks not called:
Ensure
ptiViewSetCallbacks()is called beforeptiViewEnable()Verify callbacks are properly registered
Check that you’re enabling the correct view types
- Build errors:
See the Building and Installing guide for build requirements
Ensure C++17 support is available
- Performance issues:
Use on-demand collection (
ptiViewEnable/ptiViewDisable) to reduce overheadConsider reducing callback complexity
Profile only regions of interest
For more help, see the GitHub repository or submit an issue.