The patent badge is an abbreviated version of the USPTO patent document. The patent badge does contain a link to the full patent document.

The patent badge is an abbreviated version of the USPTO patent document. The patent badge covers the following: Patent number, Date patent was issued, Date patent was filed, Title of the patent, Applicant, Inventor, Assignee, Attorney firm, Primary examiner, Assistant examiner, CPCs, and Abstract. The patent badge does contain a link to the full patent document (in Adobe Acrobat format, aka pdf). To download or print any patent click here.

Date of Patent:
Jun. 03, 2025

Filed:

Apr. 19, 2023
Applicant:

The Toronto-dominion Bank, Toronto, CA;

Inventors:

Maksims Volkovs, Toronto, CA;

Xiao Shi Huang, Toronto, CA;

Juan Felipe Perez Vallejo, Toronto, CA;

Assignee:
Attorney:
Primary Examiner:
Int. Cl.
CPC ...
G06F 9/355 (2018.01); G06F 9/30 (2018.01); G06F 9/46 (2006.01); G06F 18/214 (2023.01); G06F 18/2413 (2023.01); G06N 3/045 (2023.01); G06N 3/084 (2023.01); G06N 20/00 (2019.01);
U.S. Cl.
CPC ...
G06N 3/084 (2013.01); G06F 9/30036 (2013.01); G06F 9/3555 (2013.01); G06F 9/463 (2013.01); G06F 18/214 (2023.01); G06N 20/00 (2019.01);
Abstract

An online system trains a transformer architecture by an initialization method which allows the transformer architecture to be trained without normalization layers of learning rate warmup, resulting in significant improvements in computational efficiency for transformer architectures. Specifically, an attention block included in an encoder or a decoder of the transformer architecture generates the set of attention representations by applying a key matrix to the input key, a query matrix to the input query, a value matrix to the input value to generate an output, and applying an output matrix to the output to generate the set of attention representations. The initialization method may be performed by scaling the parameters of the value matrix and the output matrix with a factor that is inverse to a number of the set of encoders or a number of the set of decoders.


Find Patent Forward Citations

Loading…