The patent badge is an abbreviated version of the USPTO patent document. The patent badge does contain a link to the full patent document.
The patent badge is an abbreviated version of the USPTO patent document. The patent badge covers the following: Patent number, Date patent was issued, Date patent was filed, Title of the patent, Applicant, Inventor, Assignee, Attorney firm, Primary examiner, Assistant examiner, CPCs, and Abstract. The patent badge does contain a link to the full patent document (in Adobe Acrobat format, aka pdf). To download or print any patent click here.
Patent No.:
Date of Patent:
Aug. 12, 2025
Filed:
Jan. 27, 2022
International Business Machines Corporation, Armonk, NY (US);
Zhong Fang Yuan, Xi'an, CN;
Tong Liu, Xi'an, CN;
Wen Wang, Beijing, CN;
Chen Gao, Xi'an, CN;
Xiang Yu Yang, Xi'an, CN;
International Business Machines Corporation, Armonk, NY (US);
Abstract
Embodiments of the present invention provide an approach for compressing data, and more particularly, to large-scale text data encoding and compression using absolute overfitting on pre-trained language models. Large-scale data is parsed into sentences. A unique token is generated for each sentence to form a token list. A generative (or compression) model is trained from the tokens in the token list to produce the corresponding sentence of each token through absolute overfitting of a pre-trained language model. The compressed text data is stored as the token list and generative model, resulting in a storage space savings.