The patent badge is an abbreviated version of the USPTO patent document. The patent badge does contain a link to the full patent document.

The patent badge is an abbreviated version of the USPTO patent document. The patent badge covers the following: Patent number, Date patent was issued, Date patent was filed, Title of the patent, Applicant, Inventor, Assignee, Attorney firm, Primary examiner, Assistant examiner, CPCs, and Abstract. The patent badge does contain a link to the full patent document (in Adobe Acrobat format, aka pdf). To download or print any patent click here.

Date of Patent:
Sep. 02, 2025

Filed:

Jul. 05, 2023
Applicant:

Databricks, Inc., San Francisco, CA (US);

Inventors:

Prakhar Jain, Sunnyvale, CA (US);

Frederick Ryan Johnson, Orem, UT (US);

Terry Kim, Bellevue, WA (US);

Vijayan Prabhakaran, Los Gatos, CA (US);

Bart Samwel, Oegstgeest, NL;

Assignee:

Databricks, Inc., San Francisco, CA (US);

Attorney:
Primary Examiner:
Int. Cl.
CPC ...
G06F 16/10 (2019.01); G06F 16/13 (2019.01); G06F 16/16 (2019.01); G06F 16/22 (2019.01); G06F 16/28 (2019.01);
U.S. Cl.
CPC ...
G06F 16/16 (2019.01); G06F 16/134 (2019.01); G06F 16/2246 (2019.01); G06F 16/285 (2019.01);
Abstract

A data processing service generates a data classifier tree for managing data files of a data table. The data classifier tree may be configured as a KD-classifier tree and includes a plurality of nodes and edges. A node of the data classifier tree may represent a splitting condition with respect to key-values for a respective key. A node of the data classifier tree may be associated with one or more data files assigned to the node. The data files assigned to the node each include a subset of records having key-values that satisfy the conditions represented by the node and parent nodes of the node. The data processing service may efficiently cluster the data in the data table while reducing the number of data files that are rewritten when data is modified or added to the data table.


Find Patent Forward Citations

Loading…