The patent badge is an abbreviated version of the USPTO patent document. The patent badge does contain a link to the full patent document.
The patent badge is an abbreviated version of the USPTO patent document. The patent badge covers the following: Patent number, Date patent was issued, Date patent was filed, Title of the patent, Applicant, Inventor, Assignee, Attorney firm, Primary examiner, Assistant examiner, CPCs, and Abstract. The patent badge does contain a link to the full patent document (in Adobe Acrobat format, aka pdf). To download or print any patent click here.
Patent No.:
Date of Patent:
Sep. 19, 2023
Filed:
May. 18, 2020
Google Llc, Mountain View, CA (US);
Xinying Song, Bellevue, WA (US);
Yang Song, Bellevue, WA (US);
Google LLC, Mountain View, CA (US);
Abstract
Systems and methods for performing inference for word or wordpiece tokenization are disclosed using a left-to-right longest-match-first greedy process. In some examples, the vocabulary may be organized into a trie structure in which each node includes a precomputed token or token ID and a fail link, so that the tokenizer can parse the trie in a single pass to generate a list of only those tokens or token IDs that correspond to the longest matching vocabulary entries in the sample string, without the need for backtracking. In some examples, the vocabulary may be organized into a trie in which each node has a fail link, and any node that would share token(s) or token_ID(s) of a preceding node is instead given a prev_match link that points back to a chain of nodes with those token(s) or token_ID(s).