The patent badge is an abbreviated version of the USPTO patent document. The patent badge does contain a link to the full patent document.

The patent badge is an abbreviated version of the USPTO patent document. The patent badge covers the following: Patent number, Date patent was issued, Date patent was filed, Title of the patent, Applicant, Inventor, Assignee, Attorney firm, Primary examiner, Assistant examiner, CPCs, and Abstract. The patent badge does contain a link to the full patent document (in Adobe Acrobat format, aka pdf). To download or print any patent click here.

Date of Patent:
Jun. 08, 2021

Filed:

May. 25, 2018
Applicant:

Microsoft Technology Licensing, Llc, Redmond, WA (US);

Inventors:

Abedelkader Asi, Kfar Bara, IL;

Liron Izhaki-Allerhand, Holon, IL;

Ran Mizrachi, Mishmar Hashiva, IL;

Royi Ronen, Tel Aviv, IL;

Ohad Jassin, Tel Mond, IL;

Assignee:
Attorney:
Primary Examiner:
Int. Cl.
CPC ...
G10L 15/22 (2006.01); G10L 15/06 (2013.01); G10L 15/16 (2006.01); G10L 15/18 (2013.01); G06F 40/44 (2020.01);
U.S. Cl.
CPC ...
G10L 15/22 (2013.01); G06F 40/44 (2020.01); G10L 15/063 (2013.01); G10L 15/16 (2013.01); G10L 15/1822 (2013.01);
Abstract

Technology is disclosed for providing dynamic identification and extraction or tagging of contextually-coherent text blocks from an electronic document. In an embodiment, an electronic document may be parsed into a plurality of content tokens that each corresponds to a portion of the electronic document, such as a sentence or a paragraph. Employing a sliding window approach, a number of token groups are independently analyzed, where each group of tokens has a different number of tokens included therein. Each token group is analyzed to determine confidence scores for various determinable contexts based on content included in the token set. The confidence scores can then be processed for each token group to determine an entropy score for the token group. In this way, one of the analyzed token groups can be selected as a representative text block that corresponds to one of the plurality of determinable contexts. A corresponding portion of the electronic document can be tagged with a corresponding context determined based on the analyzed content included therein, and provided for output.


Find Patent Forward Citations

Loading…