
NVIDIA CUDA Profiling Tools Interface (CUPTI)NVIDIA
The NVIDIA CUDA Profiling Tools Interface (CUPTI) is a library that enables the creation of profiling and tracing tools that target CUDA applications.
Vendor
NVIDIA
Company Website
Product details
The NVIDIA CUDA Profiling Tools Interface (CUPTI) is a library that enables the creation of profiling and tracing tools targeting CUDA applications. CUPTI provides a set of APIs for independent software vendors (ISVs) to create profilers and other performance optimization tools. These APIs include the Activity API, Callback API, Host Profiling API, Range Profiling API, PC Sampling API, SASS Metric API, PM Sampling API, Checkpoint API, Profiling API, Event API, Metric API, and the Python API. CUPTI is packaged with the CUDA Toolkit and occasionally updated between toolkit releases.
Features
- Trace CUDA API: Register callbacks for CUDA API calls, supporting entry and exit points in the CUDA C Runtime (CUDART) and CUDA Driver.
- GPU Workload Trace: Trace GPU activities, including kernel executions, memory operations, and memset operations.
- Unified Memory Trace: Trace transfers between host and device, device to device, and page faults on CPU and GPU.
- Normalized Timestamps: Provide normalized timestamps for CPU and GPU traces.
- Profile Event Counters: Profile hardware and software event counters, including utilization metrics, instruction count, memory events, cache hits/misses, and more.
- Automated Bottleneck Identification: Identify bottlenecks based on metrics like instruction throughput and memory throughput.
- Range Profiling: Enable metric collection over concurrent kernel launches within a range.
- Metrics Attribution: Attribute metrics to high-level source code and executed assembly instructions.
- Device-Wide Sampling: Sample the program counter (PC) device-wide, providing samples for each source and assembly line with stall reasons.
Benefits
- Detailed Performance Insights: Gain comprehensive insights into the CPU and GPU behavior of CUDA applications.
- Low Profiling Overhead: Achieve low and deterministic profiling overhead on the target system.
- Enhanced Debugging: Improve debugging capabilities with detailed metrics and event counters.
- Automated Optimization: Automate bottleneck identification and optimization processes.
- Flexible Integration: Integrate CUPTI with existing profiling tools and workflows.