The patent badge is an abbreviated version of the USPTO patent document. The patent badge does contain a link to the full patent document.
The patent badge is an abbreviated version of the USPTO patent document. The patent badge covers the following: Patent number, Date patent was issued, Date patent was filed, Title of the patent, Applicant, Inventor, Assignee, Attorney firm, Primary examiner, Assistant examiner, CPCs, and Abstract. The patent badge does contain a link to the full patent document (in Adobe Acrobat format, aka pdf). To download or print any patent click here.
Patent No.:
Date of Patent:
Jul. 05, 2022
Filed:
Apr. 25, 2018
Palantir Technologies Inc., Palo Alto, CA (US);
Daniel Deutsch, New York, NY (US);
Kyle Solan, San Francisco, CA (US);
Thomas Mathew, New York, NY (US);
Vasil Vasilev, Cambridge, MA (US);
PALANTIR TECHNOLOGIES INC., Palo Alto, CA (US);
Abstract
Techniques for automatically scheduling builds of derived datasets in a distributed database system that supports pipelined data transformations are described herein. In an embodiment, a data processing method comprises obtaining a definition of at least one derived dataset of a data pipeline, and in response to the obtaining: creating and storing a dependency graph in memory, the dependency graph representing the at least one derived dataset and one or more raw datasets or intermediate derived datasets on which the at least one derived dataset depends; detecting a first update to a first dataset from among the one or more raw datasets or intermediate derived datasets on which the at least one derived dataset depends, and in response to the first update: based on the dependency graph, initiating a first build of a first intermediate derived dataset that depends on the first dataset; initiating a second build that uses the first intermediate derived dataset and that is next in order in the data pipeline according to the dependency graph; asynchronously detecting a second update to a second dataset from among the one or more raw datasets or intermediate derived datasets on which the at least one derived dataset depends, and in response to the second update: based on the dependency graph, initiating a third build of a second intermediate derived dataset that depends on the second dataset; wherein the method is performed using one or more processors.