The patent badge is an abbreviated version of the USPTO patent document. The patent badge does contain a link to the full patent document.

The patent badge is an abbreviated version of the USPTO patent document. The patent badge covers the following: Patent number, Date patent was issued, Date patent was filed, Title of the patent, Applicant, Inventor, Assignee, Attorney firm, Primary examiner, Assistant examiner, CPCs, and Abstract. The patent badge does contain a link to the full patent document (in Adobe Acrobat format, aka pdf). To download or print any patent click here.

Patent No.:

US 11887270 B1

Date of Patent:

Jan. 30, 2024

Filed:

Jul. 01, 2021

Multi-scale transformer for image analysis

Applicant:

Google Llc, Mountain View, CA (US);

Inventors:

Junjie Ke, East Palo Alto, CA (US);

Feng Yang, Sunnyvale, CA (US);

Qifei Wang, Mountain View, CA (US);

Yilin Wang, Sunnyvale, CA (US);

Peyman Milanfar, Menlo Park, CA (US);

Assignee:

Google LLC, Mountain View, CA (US);

Attorney:

Botus Churchill IP Law LLP

Primary Examiner:

Jonathan S Lee

Int. Cl.

CPC ...

G06K 9/00 (2022.01); G06T 3/00 (2006.01); G06T 3/40 (2006.01); G06T 7/00 (2017.01);

U.S. Cl.

CPC ...

G06T 3/0012 (2013.01); G06T 3/40 (2013.01); G06T 7/0002 (2013.01); G06T 2207/20016 (2013.01); G06T 2207/20081 (2013.01); G06T 2207/30168 (2013.01);

Abstract

The technology employs a patch-based multi-scale Transformer () that is usable with various imaging applications. This avoids constraints on image fixed input size and predicts the quality effectively on a native resolution image. A native resolution image () is transformed into a multi-scale representation (), enabling the Transformer's self-attention mechanism to capture information on both fine-grained detailed patches and coarse-grained global patches. Spatial embedding () is employed to map patch positions to a fixed grid, in which patch locations at each scale are hashed to the same grid. A separate scale embedding () is employed to distinguish patches coming from different scales in the multiscale representation. Self-attention () is performed to create a final image representation. In some instances, prior to performing self-attention, the system may prepend a learnable classification token () to the set of input tokens.

Find Patent Forward Citations