For the Inventor, By the Inventor

The patent badge is an abbreviated version of the USPTO patent document. The patent badge does contain a link to the full patent document.

The patent badge is an abbreviated version of the USPTO patent document. The patent badge covers the following: Patent number, Date patent was issued, Date patent was filed, Title of the patent, Applicant, Inventor, Assignee, Attorney firm, Primary examiner, Assistant examiner, CPCs, and Abstract. The patent badge does contain a link to the full patent document (in Adobe Acrobat format, aka pdf). To download or print any patent click here.

Patent No.:

US 11907674 B1

Date of Patent:

Feb. 20, 2024

Filed:

Sep. 20, 2023

Generating multi-modal response(s) through utilization of large language model(s)

Applicant:

Google Llc, Mountain View, CA (US);

Inventors:

Oscar Akerlund, Zurich, CH;

Evgeny Sluzhaev, Zurich, CH;

Golnaz Ghiasi, Mountain View, CA (US);

Thang Luong, Santa Clara, CA (US);

Yifeng Lu, Mountain View, CA (US);

Igor Petrovski, Zurich, CH;

Ágoston Weisz, Zurich, CH;

Wei Yu, Mountain View, CA (US);

Rakesh Shivanna, Sunnyvale, CA (US);

Michael Andrew Goodman, Oakland, CA (US);

Apoorv Kulshreshtha, Mountain View, CA (US);

Yu Du, Sunnyvale, CA (US);

Amin Ghafouri, San Francisco, CA (US);

Sanil Jain, Sunnyvale, CA (US);

Dustin Tran, San Francisco, CA (US);

Vikas Peswani, Mountain View, CA (US);

YaGuang Li, Sunnyvale, CA (US);

Assignee:

GOOGLE LLC, Mountain View, CA (US);

Attorney:

Gray Ice Higdon

Primary Examiner:

Edgar X Guerra-Erazo

Int. Cl.

CPC ...

G06F 40/00 (2020.01); G06F 40/40 (2020.01);

U.S. Cl.

CPC ...

G06F 40/40 (2020.01);

Abstract

Implementations relate to generating multi-modal response(s) through utilization of large language model(s) (LLM(s)). Processor(s) of a system can: receive natural language (NL) based input, generate a multi-modal response that is responsive to the NL based output, and cause the multi-modal response to be rendered. In some implementations, and in generating the multi-modal response, the processor(s) can process, using a LLM, LLM input (e.g., that includes at least the NL based input) to generate LLM output, and determine, based on the LLM output, textual content for inclusion in the multi-modal response and multimedia content for inclusion in the multi-modal response. In some implementations, the multimedia content can be obtained based on a multimedia content tag that is included in the LLM output and that is indicative of the multimedia content. In various implementations, the multimedia content can be interleaved between segments of the textual content.

Find Patent Forward Citations

Loading…