The patent badge is an abbreviated version of the USPTO patent document. The patent badge does contain a link to the full patent document.

The patent badge is an abbreviated version of the USPTO patent document. The patent badge covers the following: Patent number, Date patent was issued, Date patent was filed, Title of the patent, Applicant, Inventor, Assignee, Attorney firm, Primary examiner, Assistant examiner, CPCs, and Abstract. The patent badge does contain a link to the full patent document (in Adobe Acrobat format, aka pdf). To download or print any patent click here.

Date of Patent:
Aug. 12, 2025

Filed:

Jun. 09, 2023
Applicants:

Robert Bosch Gmbh, Stuttgart, DE;

Carnegie Mellon University, Pittsburgh, PA (US);

Inventors:

Yutong He, Pittsburgh, PA (US);

Ruslan Salakhutdinov, Pittsburgh, PA (US);

Jeremy Kolter, Pittsburgh, PA (US);

Marcus Pereira, Pittsburgh, PA (US);

João D. Semedo, Pittsburgh, PA (US);

Bahare Azari, San Jose, CA (US);

Filipe J. Cabrita Condessa, Pittsburgh, PA (US);

Assignees:

Robert Bosch GmbH, , DE;

Carnegie Mellon University, Pittsburgh, PA (US);

Attorney:
Primary Examiner:
Int. Cl.
CPC ...
G06T 11/60 (2006.01); G06F 16/432 (2019.01); G06F 16/438 (2019.01); G06F 40/284 (2020.01); G06F 40/40 (2020.01); G06N 3/0475 (2023.01);
U.S. Cl.
CPC ...
G06T 11/60 (2013.01); G06F 16/432 (2019.01); G06F 16/438 (2019.01); G06F 40/284 (2020.01); G06F 40/40 (2020.01); G06N 3/0475 (2023.01);
Abstract

A method discloses receiving, at a cross-attention layer of a model, first text data describing a first object and second text data describing a first scene, wherein the first text data includes a description of a location of the first object, utilizing the model with cross-attention layers, concatenating the first text data and the second text data to generate a prompt; generating, a broadcasted location mask constructed from at least the location; generating, a broadcasted all-one matrix associated with the second text data described the first scene; computing a key matrix and a value matrix utilizing separate linear projections of the prompt; computing a query matrix utilizing linear projections; generating a broadcasted location matrix in response to concatenating the broadcasted location mask and the broadcasted all-one matrix; generating a cross-attention map utilizing the query matrix, the key matrix, and the broadcasted location matrix; and outputting a final image.


Find Patent Forward Citations

Loading…