[Xipeng’s group ICS 2015] SM-centric GPU Scheduling and Locality-based Task Grouping

Currently GPU has a thread-centric model, where a task is the work specified by kernel(thread block ID). There two important questions: When to schedule, which software can control through persistent threads, and where to schedule, which is the problem studied in this paper. It groups tasks that share data.

Task co-location is important for locality and for resource utilization. Improper concurrent execution of kernels leads to resource conflicts, e.g. too much shared memory/register demand so another kernel cannot be run.

The solution is SM centric. A worker is started by hardware to run tasks from a queue, controlled by software. The paper has a scheme to start the same number of workers on each SM. In comparison, the past work on persistent threads can only run one worker per SM.

For irregular application, the paper uses GPU to parallel partition the data/tasks into locality groups.

Measured the effect in Co-run ANTT speedup = mean( default Ti / opt Ti), (average normalized turnaround time) and Co-run throughput.

Adriaens+:HPCA12’s study of co-run kernels.

	Programming Language… on CSC 579 Logic Foundation and M…
	Programming Language… on Software Design and AI-assiste…
	Karen Wiggins on (Oct. 11) Chen gave Science an…
	CSC 253 Collaborativ… on CSC 253 Collaborative Software…
	CSC 252 Computer Org… on CSC 253 Collaborative Software…

Rochester Programming Systems Reseach

[Xipeng’s group ICS 2015] SM-centric GPU Scheduling and Locality-based Task Grouping

Leave a comment Cancel reply

Share this:

Related

Leave a comment Cancel reply