The patent badge is an abbreviated version of the USPTO patent document. The patent badge does contain a link to the full patent document.
The patent badge is an abbreviated version of the USPTO patent document. The patent badge covers the following: Patent number, Date patent was issued, Date patent was filed, Title of the patent, Applicant, Inventor, Assignee, Attorney firm, Primary examiner, Assistant examiner, CPCs, and Abstract. The patent badge does contain a link to the full patent document (in Adobe Acrobat format, aka pdf). To download or print any patent click here.
Patent No.:
Date of Patent:
Mar. 20, 2018
Filed:
May. 19, 2016
International Business Machines Corporation, Armonk, NY (US);
Donna K. Byron, Petersham, MA (US);
Renee F. Decker, Brunswick, MD (US);
Suzanne L. Estrada, Boca Raton, FL (US);
Aditya S. Gaitonde, Marlborough, MA (US);
Daniel M. Jamrog, Acton, MA (US);
John A. Morganti, Austin, TX (US);
Samir J. Patel, Billerica, MA (US);
Joseph F. Zaffarano, Malden, MA (US);
INTERNATIONAL BUSINESS MACHINES CORPORATION, Armonk, NY (US);
Abstract
Aspects of processing misaligned annotations include receiving a tokenized document and offset annotation file at a processor. The tokenized document includes a source document and corresponding tokens resulting from a low-level segmentation process. Annotations from the annotation file are applied, in conjunction with tokenization rules, to the source document, and a misalignment responsive to the applying is determined. If the misalignment is caused by an offset mismatch, an offset number of characters between the position counts in the annotation file and the source document is calculated, and the position count in the annotation file is adjusted to coincide with the position count in the source document. If the misalignment is not caused by an offset mismatch, a current position count in the source document is reset to a position count of a previous location in which a most recent alignment between the annotation file and the source document was ascertained.