The patent badge is an abbreviated version of the USPTO patent document. The patent badge does contain a link to the full patent document.

The patent badge is an abbreviated version of the USPTO patent document. The patent badge covers the following: Patent number, Date patent was issued, Date patent was filed, Title of the patent, Applicant, Inventor, Assignee, Attorney firm, Primary examiner, Assistant examiner, CPCs, and Abstract. The patent badge does contain a link to the full patent document (in Adobe Acrobat format, aka pdf). To download or print any patent click here.

Date of Patent:
Feb. 14, 2023

Filed:

Jun. 24, 2020
Applicant:

Cerebras Systems Inc., Sunnyvale, CA (US);

Inventors:

Sean Lie, Los Gatos, CA (US);

Michael Morrison, Sunnyvale, CA (US);

Michael Edwin James, San Carlos, CA (US);

Gary R. Lauterbach, Los Altos, CA (US);

Srikanth Arekapudi, Santa Clara, CA (US);

Assignee:

Cerebras Systems Inc., Sunnyvale, CA (US);

Attorneys:
Primary Examiner:
Int. Cl.
CPC ...
G06N 3/02 (2006.01); G06N 3/08 (2006.01); G06F 9/455 (2018.01); G06N 3/063 (2023.01); G06N 3/084 (2023.01); G06N 3/04 (2023.01); G06N 3/10 (2006.01);
U.S. Cl.
CPC ...
G06N 3/08 (2013.01); G06F 9/45533 (2013.01); G06N 3/02 (2013.01); G06N 3/04 (2013.01); G06N 3/0481 (2013.01); G06N 3/063 (2013.01); G06N 3/084 (2013.01); G06N 3/10 (2013.01); G06N 3/0454 (2013.01);
Abstract

Techniques in advanced deep learning provide improvements in one or more of accuracy, performance, and energy efficiency, such as accuracy of learning, accuracy of prediction, speed of learning, performance of learning, and energy efficiency of learning. An array of processing elements performs flow-based computations on wavelets of data. Each processing element has a respective compute element and a respective routing element. Each compute element has processing resources and memory resources. Each router enables communication via wavelets with at least nearest neighbors in a 2D mesh. Stochastic gradient descent, mini-batch gradient descent, and continuous propagation gradient descent are techniques usable to train weights of a neural network modeled by the processing elements. Reverse checkpoint is usable to reduce memory usage during the training.


Find Patent Forward Citations

Loading…