The patent badge is an abbreviated version of the USPTO patent document. The patent badge does contain a link to the full patent document.

The patent badge is an abbreviated version of the USPTO patent document. The patent badge covers the following: Patent number, Date patent was issued, Date patent was filed, Title of the patent, Applicant, Inventor, Assignee, Attorney firm, Primary examiner, Assistant examiner, CPCs, and Abstract. The patent badge does contain a link to the full patent document (in Adobe Acrobat format, aka pdf). To download or print any patent click here.

Date of Patent:
Sep. 02, 2025

Filed:

Jul. 26, 2022
Applicant:

Nvidia Corporation, Santa Clara, CA (US);

Inventors:

Charbel Sakr, Mountain View, CA (US);

Steve Haihang Dai, Union City, CA (US);

Brucek Kurdo Khailany, Austin, TX (US);

William James Dally, Incline Village, NV (US);

Rangharajan Venkatesan, San Jose, CA (US);

Brian Matthew Zimmer, Berkeley, CA (US);

Assignee:

NVIDIA Corporation, Santa Clara, CA (US);

Attorney:
Primary Examiner:
Int. Cl.
CPC ...
G06F 7/00 (2006.01); G06N 3/04 (2023.01); G06N 3/08 (2023.01);
U.S. Cl.
CPC ...
G06N 3/04 (2013.01); G06N 3/08 (2013.01);
Abstract

Quantizing tensors and vectors processed within a neural network reduces power consumption and may accelerate processing. Quantization reduces the number of bits used to represent a value, where decreasing the number of bits used can decrease the accuracy of computations that use the value. Ideally, quantization is performed without reducing accuracy. Quantization-aware training (QAT) is performed by dynamically quantizing tensors (weights and activations) using optimal clipping scalars. 'Optimal' in that the mean squared error (MSE) of the quantized operation is minimized and the clipping scalars define the degree or amount of quantization for various tensors of the operation. Conventional techniques that quantize tensors during training suffer from high amounts of noise (error). Other techniques compute the clipping scalars offline through a brute force search to provide high accuracy. In contrast, the optimal clipping scalars can be computed online and provide the same accuracy as the clipping scalars computed offline.


Find Patent Forward Citations

Loading…