Modeling Application Performance under Multi-Instance (Multi-Stream) Execution Scenarios
Multi-stream execution is a technique in GPUs that allows multiple operations/kernels from the same program to effectively use GPUs without explicitly stating the affinity of threads to the cores. Several recent optimizations in Machine Learning (ML) algorithms leverage multi-stream execution. While performance modeling of ML applications is well studied under single-stream execution, performance models of novel ML applications under multi-stream execution is lacking. There is a pressing need to develop performance models for multi-stream execution – that would be the primary area of exploration under this co-op/internship. Specifically, we are expecting to study the state of the art (SOTA) in performance modeling for multi-stream execution and develop first principles performance models, conduct validation with silicon performance, and integrate the performance models in an internal simulator developed at AMD.