The patent badge is an abbreviated version of the USPTO patent document. The patent badge does contain a link to the full patent document.

The patent badge is an abbreviated version of the USPTO patent document. The patent badge covers the following: Patent number, Date patent was issued, Date patent was filed, Title of the patent, Applicant, Inventor, Assignee, Attorney firm, Primary examiner, Assistant examiner, CPCs, and Abstract. The patent badge does contain a link to the full patent document (in Adobe Acrobat format, aka pdf). To download or print any patent click here.

Date of Patent:
Apr. 28, 2020

Filed:

Apr. 02, 2018
Applicant:

Cloudera, Inc., Palo Alto, CA (US);

Inventors:

Sudhanshu Arora, Sunnyvale, CA (US);

Mark Donsky, San Francisco, CA (US);

Guang Yao Leng, Mountain View, CA (US);

Naren Koneru, Fremont, CA (US);

Chang She, San Francisco, CA (US);

Vikas Singh, San Jose, CA (US);

Himabindu Vuppula, Saratoga, CA (US);

Assignee:

Cloudera, Inc., Palo Alto, CA (US);

Attorney:
Primary Examiner:
Int. Cl.
CPC ...
G06F 16/34 (2019.01); G06N 5/04 (2006.01); G06F 9/455 (2018.01); G06F 16/38 (2019.01); G06F 16/28 (2019.01); G06F 16/182 (2019.01);
U.S. Cl.
CPC ...
G06F 16/345 (2019.01); G06F 9/45558 (2013.01); G06F 16/288 (2019.01); G06F 16/38 (2019.01); G06N 5/04 (2013.01); G06F 16/182 (2019.01); G06F 2009/4557 (2013.01);
Abstract

Transient computing clusters can be temporarily provisioned in cloud-based infrastructure to run data processing tasks. Such tasks may be run by services operating in the clusters that consume and produce data including operational metadata. Techniques are introduced for tracking data lineage across multiple clusters, including transient computing clusters, based on the operational metadata. In some embodiments, operational metadata is extracted from the transient computing clusters and aggregated at a metadata system for analysis. Based on the analysis of the metadata, operations can be summarized at a cluster level even if the transient computing cluster no longer exists. Further relationships between workflows, such as dependencies or redundancies, can be identified and utilized to optimize the provisioning of computing clusters and tasks performed by the computing clusters.


Find Patent Forward Citations

Loading…