The patent badge is an abbreviated version of the USPTO patent document. The patent badge does contain a link to the full patent document.

The patent badge is an abbreviated version of the USPTO patent document. The patent badge covers the following: Patent number, Date patent was issued, Date patent was filed, Title of the patent, Applicant, Inventor, Assignee, Attorney firm, Primary examiner, Assistant examiner, CPCs, and Abstract. The patent badge does contain a link to the full patent document (in Adobe Acrobat format, aka pdf). To download or print any patent click here.

Date of Patent:
Feb. 01, 2022

Filed:

Sep. 27, 2018
Applicant:

International Business Machines Corporation, Armonk, NY (US);

Inventors:

Bo Zhang, Cary, NC (US);

Alexander Sobran, Chapel Hill, NC (US);

David Wehr, Sigourney, IA (US);

Halley Fede, Albany, NY (US);

Eleanor Pence, Albany, NY (US);

Joseph Hughes, Durham, NC (US);

John H. Walczyk, III, Raleigh, NC (US);

Guilherme Ferreira, Raleigh, NC (US);

Attorneys:
Primary Examiner:
Int. Cl.
CPC ...
G06K 9/62 (2006.01); G06N 3/08 (2006.01); G06N 5/02 (2006.01); G06N 20/00 (2019.01); G06F 40/30 (2020.01);
U.S. Cl.
CPC ...
G06K 9/6215 (2013.01); G06F 40/30 (2020.01); G06K 9/6256 (2013.01); G06N 3/08 (2013.01); G06N 5/022 (2013.01); G06N 20/00 (2019.01);
Abstract

A method, system and computer program product for obtaining vector representations of code snippets capturing semantic similarity. A first and second training set of code snippets are collected, where the first training set of code snippets implements the same function representing semantic similarity and the second training set of code snippets implements a different function representing semantic dissimilarity. A vector representation of a first and second code snippet from either the first or second training set of code snippets is generated using a machine learning model. A loss value is generated utilizing a loss function that is proportional or inverse to the distance between the first and second vectors in response to receiving the first and second code snippets from the first or second training set of code snippets, respectively. The machine learning model is trained to capture the semantic similarity in the code snippets by minimizing the loss value.


Find Patent Forward Citations

Loading…