The patent badge is an abbreviated version of the USPTO patent document. The patent badge does contain a link to the full patent document.
The patent badge is an abbreviated version of the USPTO patent document. The patent badge covers the following: Patent number, Date patent was issued, Date patent was filed, Title of the patent, Applicant, Inventor, Assignee, Attorney firm, Primary examiner, Assistant examiner, CPCs, and Abstract. The patent badge does contain a link to the full patent document (in Adobe Acrobat format, aka pdf). To download or print any patent click here.
Patent No.:
Date of Patent:
Jan. 26, 2021
Filed:
Nov. 23, 2018
Amazon Technologies, Inc., Seattle, WA (US);
Russell Reas, Seattle, WA (US);
Neela Sawant, Jersey City, NJ (US);
Srinivasan Sengamedu Hanumantha Rao, Seattle, WA (US);
Yinglong Wang, Redmond, WA (US);
Anton Emelyanov, Seattle, WA (US);
Shishir Sethiya, Seattle, WA (US);
Amazon Technologies, Inc., Seattle, WA (US);
Abstract
Techniques for unsupervised learning of embeddings on source code from non-local contexts are described. Code can be processed to generate an abstract syntax tree (AST) which represents syntactic paths between tokens in the code. Once the AST(s) have been generated, the paths in the AST(s) can be crawled to identify terminals (e.g., leaf nodes in the AST) and paths between terminals can be identified. The pairs of tokens identified at the ends of each path can then be used to generate a cooccurrence matrix. For example, if X number of unique terminals are identified, a matrix of size X by X can be generated to indicate a frequency at which pairs of terminals cooccur. This cooccurrence matrix can then be used as input to existing techniques for learning vector-space embeddings, such as word2vec, GloVe, Swivel, etc.