The patent badge is an abbreviated version of the USPTO patent document. The patent badge does contain a link to the full patent document.
The patent badge is an abbreviated version of the USPTO patent document. The patent badge covers the following: Patent number, Date patent was issued, Date patent was filed, Title of the patent, Applicant, Inventor, Assignee, Attorney firm, Primary examiner, Assistant examiner, CPCs, and Abstract. The patent badge does contain a link to the full patent document (in Adobe Acrobat format, aka pdf). To download or print any patent click here.
Patent No.:
Date of Patent:
Jun. 25, 2024
Filed:
Dec. 01, 2021
Google Llc, Mountain View, CA (US);
Jiafan Zhu, San Jose, CA (US);
Jianqiao Liu, Basking Ridge, NJ (US);
Xiangyu Dong, Sunnyvale, CA (US);
Xiao Zhang, San Jose, CA (US);
Jikai Tang, Mountain View, CA (US);
Kexin Yang, Sunnyvale, CA (US);
Yong Zhao, Sunnyvale, CA (US);
Alireza Ghaffarkhah, San Jose, CA (US);
Arash Rezaei, Saratoga, CA (US);
Dayou Du, Jersey City, NJ (US);
Yazhou Zu, San Francisco, CA (US);
Xiangling Kong, Sunnyvale, CA (US);
Hoang-Vu Dang, San Jose, CA (US);
Alexander Vadimovich Kolbasov, Palo Alto, CA (US);
Google LLC, Mountain View, CA (US);
Abstract
Methods, systems, and apparatus, including computer programs encoded on computer storage media for performing preflight checks of a distributed computing system, are described. In one aspect, a method includes assigning a computing workload to a first subset of hardware accelerator machines each having one or more hardware accelerators. A preflight check on the first subset is performed before performing the computing workload to verify the functionality of each machine in the first subset. For each hardware accelerator machine of the first subset, a program code package is installed, including a task action based at least in part on characteristics of the computing workload. The task action including a sequence of operations is performed on the hardware accelerator machine to determine whether the task action fails. Whenever the task action fails, the computing workload is re-assigned to a second subset of hardware accelerator machines different from the first subset.