The patent badge is an abbreviated version of the USPTO patent document. The patent badge does contain a link to the full patent document.

The patent badge is an abbreviated version of the USPTO patent document. The patent badge covers the following: Patent number, Date patent was issued, Date patent was filed, Title of the patent, Applicant, Inventor, Assignee, Attorney firm, Primary examiner, Assistant examiner, CPCs, and Abstract. The patent badge does contain a link to the full patent document (in Adobe Acrobat format, aka pdf). To download or print any patent click here.

Date of Patent:
Mar. 08, 2022

Filed:

Feb. 25, 2020
Applicant:

International Business Machines Corporation, Armonk, NY (US);

Inventors:

Jonathan Herzig, Tel-Aviv, IL;

Achiya Jerbi, Netanya, IL;

David Konopnicki, Haifa, IL;

Guy Lev, Tel-Aviv, IL;

Michal Shmueli-Scheuer, Tel-Aviv, IL;

Attorney:
Primary Examiner:
Assistant Examiner:
Int. Cl.
CPC ...
G06F 17/00 (2019.01); G06F 40/12 (2020.01); G06F 40/274 (2020.01); G06N 20/10 (2019.01); G06N 7/00 (2006.01); G06F 40/30 (2020.01);
U.S. Cl.
CPC ...
G06F 40/12 (2020.01); G06F 40/274 (2020.01); G06F 40/30 (2020.01); G06N 7/005 (2013.01); G06N 20/10 (2019.01);
Abstract

Embodiments may provide techniques to generate training data for summarization of complex documents, such as scientific papers, articles, etc., that are scalable to provide large scale training data. For example, in an embodiment, a method may be implemented in a computer system and may comprise collecting a plurality of video and audio recordings of presentations of documents, collecting a plurality of documents corresponding to the video and audio recordings, converting the plurality of video and audio recordings of presentations of documents into transcripts of the plurality of presentations, generating a summary of each document by selecting a plurality of sentences from each document using the transcript of the that document, generating a dataset comprising a plurality of the generated summaries, and training a machine learning model using the generated dataset.


Find Patent Forward Citations

Loading…