For the Inventor, By the Inventor

The patent badge is an abbreviated version of the USPTO patent document. The patent badge does contain a link to the full patent document.

The patent badge is an abbreviated version of the USPTO patent document. The patent badge covers the following: Patent number, Date patent was issued, Date patent was filed, Title of the patent, Applicant, Inventor, Assignee, Attorney firm, Primary examiner, Assistant examiner, CPCs, and Abstract. The patent badge does contain a link to the full patent document (in Adobe Acrobat format, aka pdf). To download or print any patent click here.

Patent No.:

US 12481839 B1

Date of Patent:

Nov. 25, 2025

Filed:

Dec. 14, 2023

System and method for layout-aware generative pretraining for visually rich document understanding

Applicant:

Jpmorgan Chase Bank, N.a., New York, NY (US);

Inventors:

Dongsheng Wang, London, GB;

Natraj Raman, London, GB;

Mathieu Sibue, New York, NY (US);

Zhiqiang Ma, Short Hills, NJ (US);

Petr Babkin, San Francisco, CA (US);

Simerjot Kaur, Jersey City, NJ (US);

Yulong Pei, Eindhoven, NL;

Armineh Nourbakhsh, Pittsburgh, PA (US);

Assignee:

JPMORGAN CHASE BANK, N.A., New York, NY (US);

Attorney:

Greenblum & Bernstein, P.L.C.

Primary Examiner:

Antim G Shah

Int. Cl.

CPC ...

G06F 40/40 (2020.01); G06V 30/19 (2022.01); G06V 30/412 (2022.01); G06V 30/413 (2022.01);

U.S. Cl.

CPC ...

G06F 40/40 (2020.01); G06V 30/19147 (2022.01); G06V 30/412 (2022.01); G06V 30/413 (2022.01);

Abstract

Various methods and processes, apparatuses/systems, and media for performing spatial-aware reading for visual documents are disclosed. A processor implements a language model; modifies existing parameters and architecture of the language model to incorporate new parameters in its architecture by integrating a disentangled spatial attention process to the language model; pretrains the modified model by performing an autoregressive block infilling process on a plurality of document pages thereby training the new parameters and further adjusting the existing parameters; instruction-tunes the pretrained model on data derived from a plurality of visually rich document understanding datasets to teach the pretrained model to follow document-oriented instructions or answer questions about documents by leveraging their content and their spatial layout and outputting a trained model; and performs spatial-aware reading for visual documents by utilizing the trained model.

Find Patent Forward Citations

Loading…