Multi-stream execution is a technique in GPUs that allows multiple operations/kernels from the same program to effectively use GPUs without explicitly stating the affinity of threads to the cores. Several recent optimizations in Machine Learning (ML) algorithms leverage multi-stream execution. While performance modeling of ML applications is well studied under single-stream execution, performance models of […]
Read More