The patent badge is an abbreviated version of the USPTO patent document. The patent badge does contain a link to the full patent document.

The patent badge is an abbreviated version of the USPTO patent document. The patent badge covers the following: Patent number, Date patent was issued, Date patent was filed, Title of the patent, Applicant, Inventor, Assignee, Attorney firm, Primary examiner, Assistant examiner, CPCs, and Abstract. The patent badge does contain a link to the full patent document (in Adobe Acrobat format, aka pdf). To download or print any patent click here.

Date of Patent:
Nov. 28, 2017

Filed:

Aug. 16, 2016
Applicant:

Nvidia Corporation, Santa Clara, CA (US);

Inventors:

Brian Fahs, Los Altos, CA (US);

Ming Y Siu, Santa Clara, CA (US);

Brett W. Coon, San Jose, CA (US);

John R. Nickolls, Los Altos, CA (US);

Lars Nyland, Carrboro, NC (US);

Assignee:

NVIDIA Corporation, Santa Clara, CA (US);

Attorney:
Primary Examiner:
Int. Cl.
CPC ...
G06F 9/30 (2006.01); G06F 9/52 (2006.01); G06F 9/38 (2006.01); G06F 9/45 (2006.01);
U.S. Cl.
CPC ...
G06F 9/522 (2013.01); G06F 9/3004 (2013.01); G06F 9/30087 (2013.01); G06F 9/30145 (2013.01); G06F 9/3851 (2013.01); G06F 8/458 (2013.01);
Abstract

One embodiment of the present invention sets forth a technique for performing aggregation operations across multiple threads that execute independently. Aggregation is specified as part of a barrier synchronization or barrier arrival instruction, where in addition to performing the barrier synchronization or arrival, the instruction aggregates (using reduction or scan operations) values supplied by each thread. When a thread executes the barrier aggregation instruction the thread contributes to a scan or reduction result, and waits to execute any more instructions until after all of the threads have executed the barrier aggregation instruction. A reduction result is communicated to each thread after all of the threads have executed the barrier aggregation instruction and a scan result is communicated to each thread as the barrier aggregation instruction is executed by the thread.


Find Patent Forward Citations

Loading…