Overview
The CUDA Profiling Tools Interface (CUPTI) enables the creation of profiling and tracing tools that target CUDA applications. CUPTI provides the following APIs: the Activity API, the Callback API, the Event API, the Metric API, the Profiling API, the PC Sampling API and the Checkpoint API. Using these APIs, you can develop profiling tools that give insight into the CPU and GPU behavior of CUDA applications. CUPTI is delivered as a dynamic library on all platforms supported by CUDA.
In this CUPTI document, Tracing refers to the collection of events as they occur in time during the execution of a CUDA application. Tracing helps in identifying performance issues for the CUDA code by telling you which parts of a program require the most time. Tracing includes capturing timing information and relevant properties for CUDA APIs, kernels, memcopies, memsets, unified memory etc. Tracing information can be collected using the Activity and Callback APIs.
In this CUPTI document, Profiling refers to the collection of GPU performance metrics for a single kernel or a set of kernels in isolation. Profiling might involve multiple replays of the kernel/s or the entire application to collect GPU performance metrics. These metrics can be collected using Profiling, PC Sampling, Event and Metric APIs.
What's New
- Two new fields channelID and channelType are added in the activity records for kernel, memcpy, peer-to-peer memcpy and memset to output the ID and type of the hardware channel on which these activities happen. Activity records CUpti_ActivityKernel6, CUpti_ActivityMemcpy4, CUpti_ActivityMemcpyPtoP3 and CUpti_ActivityMemset3 are deprecated and replaced by new activity records CUpti_ActivityKernel7, CUpti_ActivityMemcpy5, CUpti_ActivityMemcpyPtoP4 and CUpti_ActivityMemset4.
- New fields isMigEnabled, gpuInstanceId, computeInstanceId and migUuid are added in the device activity record to provide MIG information for the MIG enabled GPU. Activity record CUpti_ActivityDevice3 is deprecated and replaced by a new activity record CUpti_ActivityDevice4.
- A new field utilizedSize is added in the memory pool activity record to provide the utilized size of the memory pool. Activity record CUpti_ActivityMemoryPool is deprecated and replaced by a new activity record CUpti_ActivityMemoryPool2.
- API cuptiActivityRegisterTimestampCallback and callback function CUpti_TimestampCallbackFunc are added to register a callback function to obtain timestamp of user's choice instead of using CUPTI provided timestamp in activity records.
- Profiling API supports profiling of the OptiX application.