The patent badge is an abbreviated version of the USPTO patent document. The patent badge does contain a link to the full patent document.

The patent badge is an abbreviated version of the USPTO patent document. The patent badge covers the following: Patent number, Date patent was issued, Date patent was filed, Title of the patent, Applicant, Inventor, Assignee, Attorney firm, Primary examiner, Assistant examiner, CPCs, and Abstract. The patent badge does contain a link to the full patent document (in Adobe Acrobat format, aka pdf). To download or print any patent click here.

Date of Patent:
Oct. 14, 2025

Filed:

Jun. 28, 2022
Applicant:

Hewlett Packard Enterprise Development Lp, Houston, TX (US);

Inventors:

Annmary Justine Koomthanam, Bangalore, IN;

Suparna Bhattacharya, Bangalore, IN;

Aalap Tripathy, Houston, TX (US);

Sergey Serebryakov, Milpitas, CA (US);

Martin Foltin, Ft. Collins, CO (US);

Paolo Faraboschi, Milpitas, CA (US);

Assignee:
Attorney:
Primary Examiner:
Int. Cl.
CPC ...
G06F 16/215 (2019.01); G06F 16/25 (2019.01); G06F 16/27 (2019.01); G06F 18/214 (2023.01); G06N 20/00 (2019.01);
U.S. Cl.
CPC ...
G06F 16/215 (2019.01); G06F 16/254 (2019.01); G06F 16/27 (2019.01); G06F 18/214 (2023.01); G06N 20/00 (2019.01);
Abstract

Systems and methods are provide for automatically constructing data lineage representations for distributed data processing pipelines. These data lineage representations (which are constructed and stored in a central repository shared by the multiple data processing sites) can be used to among other things, clone the distributed data processing pipeline for quality assurance or debugging purposes. Examples of the presently disclosed technology are able to construct data lineage representations for distributed data processing pipelines by (1) generating a hash content value for universally identifying each data artifact of the distributed data processing pipeline across the multiple processing stages/processing sites of the distributed data processing pipeline; and (2) creating an data processing pipeline abstraction hierarchy for associating each data artifact to input and output events for given executions of given data processing stages (performed by the multiple data processing sites).


Find Patent Forward Citations

Loading…