The patent badge is an abbreviated version of the USPTO patent document. The patent badge does contain a link to the full patent document.

The patent badge is an abbreviated version of the USPTO patent document. The patent badge covers the following: Patent number, Date patent was issued, Date patent was filed, Title of the patent, Applicant, Inventor, Assignee, Attorney firm, Primary examiner, Assistant examiner, CPCs, and Abstract. The patent badge does contain a link to the full patent document (in Adobe Acrobat format, aka pdf). To download or print any patent click here.

Patent No.:

US 11487944 B1

Date of Patent:

Nov. 01, 2022

Filed:

Sep. 24, 2020

System, method, and computer program for obtaining a unified named entity recognition model with the collective predictive capabilities of teacher models with different tag sets using marginal distillation

Applicant:

Asapp, Inc., New York, NY (US);

Inventors:

Yi Yang, New York, NY (US);

Keunwoo Peter Yu, Detroit, MI (US);

Assignee:

ASAPP, Inc., New York, NY (US);

Attorney:

Lessani Law Group, PC

Primary Examiner:

Feng-Tzer Tzeng

Int. Cl.

CPC ...

G06F 40/295 (2020.01); G06N 3/04 (2006.01); G06N 3/08 (2006.01); G06N 20/00 (2019.01);

U.S. Cl.

CPC ...

G06F 40/295 (2020.01); G06N 3/0454 (2013.01); G06N 3/08 (2013.01); G06N 20/00 (2019.01);

Abstract

The present disclosure sets forth a marginal distillation approach to obtaining a unified name-entity recognition (NER) student model from a plurality of pre-trained teacher NER models with different tag sets. Knowledge from the teacher models is distilled into a student model without requiring access to the annotated training data used to train the teacher models. In particular, the system receives a tag hierarchy that combines the different teacher tag sets. The teacher models and the student model are applied to a set of input data sequence to obtain tag predictions for each of the models. A distillation loss is computed between the student and each of the teacher models. If teacher's predictions are less fine-grained than the student's with respect to a node in the tag hierarchy, the student's more fine-grained predictions for the node are marginalized in computing the distillation loss. The overall loss is minimized, resulting in the student model acquiring the collective predictive capabilities of the teacher models.

Find Patent Forward Citations