The patent badge is an abbreviated version of the USPTO patent document. The patent badge does contain a link to the full patent document.

The patent badge is an abbreviated version of the USPTO patent document. The patent badge covers the following: Patent number, Date patent was issued, Date patent was filed, Title of the patent, Applicant, Inventor, Assignee, Attorney firm, Primary examiner, Assistant examiner, CPCs, and Abstract. The patent badge does contain a link to the full patent document (in Adobe Acrobat format, aka pdf). To download or print any patent click here.

Patent No.:

US 11935179 B1

Date of Patent:

Mar. 19, 2024

Filed:

Mar. 15, 2023

Fully-fused neural network execution

Applicant:

Nvidia Corporation, Santa Clara, CA (US);

Inventors:

Thomas Müller, Rheinfelden, DE;

Nikolaus Binder, Berlin, DE;

Fabrice Pierre Armand Rousselle, Ostermundigen, CH;

Jan Novák, Meilen, CH;

Alexander Georg Keller, Berlin, DE;

Assignee:

NVIDIA Corporation, Santa Clara, CA (US);

Attorney:

Leydig, Voit & Mayer, Ltd.

Primary Examiner:

Andrew G Yang

Int. Cl.

CPC ...

G06T 15/06 (2011.01); G06N 3/10 (2006.01); G06T 15/00 (2011.01); G06T 15/50 (2011.01);

U.S. Cl.

CPC ...

G06T 15/06 (2013.01); G06N 3/10 (2013.01); G06T 15/005 (2013.01); G06T 15/506 (2013.01); G06T 2210/52 (2013.01);

Abstract

A fully-connected neural network may be configured for execution by a processor as a fully-fused neural network by limiting slow global memory accesses to reading and writing inputs to and outputs from the fully-connected neural network. The computational cost of fully-connected neural networks scale quadratically with its width, whereas its memory traffic scales linearly. Modern graphics processing units typically have much greater computational throughput compared with memory bandwidth, so that for narrow, fully-connected neural networks, the linear memory traffic is the bottleneck. The key to improving performance of the fully-connected neural network is to minimize traffic to slow 'global' memory (off-chip memory and high-level caches) and to fully utilize fast on-chip memory (low-level caches, “shared” memory, and registers), which is achieved by the fully-fused approach. A real-time neural radiance caching technique for path-traced global illumination is implemented using the fully-fused neural network for caching scattered radiance components of global illumination.

Find Patent Forward Citations