The patent badge is an abbreviated version of the USPTO patent document. The patent badge does contain a link to the full patent document.
The patent badge is an abbreviated version of the USPTO patent document. The patent badge covers the following: Patent number, Date patent was issued, Date patent was filed, Title of the patent, Applicant, Inventor, Assignee, Attorney firm, Primary examiner, Assistant examiner, CPCs, and Abstract. The patent badge does contain a link to the full patent document (in Adobe Acrobat format, aka pdf). To download or print any patent click here.
Patent No.:
Date of Patent:
Aug. 23, 2022
Filed:
Nov. 29, 2017
Amazon Technologies, Inc., Seattle, WA (US);
Manolya McCormick, Culver City, CA (US);
Muhammad Yahia, Anaheim, CA (US);
Amazon Technologies, Inc., Seattle, WA (US);
Abstract
Method and apparatus for detecting text encoding errors caused by previously encoding the electronic document in multiple encoding formats. Non-word portions are removed from the electronic document. Embodiments determine whether words in the electronic document are likely to contain one or more text encoding errors, by dividing the first word into n-grams of length 2 or more. For each of the plurality of n-grams, a database is queried to determine a respective probability of the n-gram appearing in each of a plurality of recognized languages, and upon determining that the determined probabilities of two consecutive n-grams are each less than a predefined threshold probability, the first word is added to a list of words that likely contain text encoding errors. A confidence level that the first word includes the one or more text encoding errors is calculated, based on a lowest determined probably for the n-grams for the first word.