The patent badge is an abbreviated version of the USPTO patent document. The patent badge does contain a link to the full patent document.

The patent badge is an abbreviated version of the USPTO patent document. The patent badge covers the following: Patent number, Date patent was issued, Date patent was filed, Title of the patent, Applicant, Inventor, Assignee, Attorney firm, Primary examiner, Assistant examiner, CPCs, and Abstract. The patent badge does contain a link to the full patent document (in Adobe Acrobat format, aka pdf). To download or print any patent click here.

Date of Patent:
Feb. 18, 2025

Filed:

Nov. 03, 2023
Applicant:

Databricks, Inc., San Francisco, CA (US);

Inventors:

Terry Kim, Belleview, WA (US);

Lin Ma, Ann Arbor, MI (US);

Rahul Shivu Mahadev, Santa Clara, CA (US);

Rahul Potharaju, San Ramon, CA (US);

Assignee:

Databricks, Inc., San Francisco, CA (US);

Attorney:
Primary Examiner:
Int. Cl.
CPC ...
G06F 16/28 (2019.01); G06F 16/21 (2019.01); G06F 16/22 (2019.01);
U.S. Cl.
CPC ...
G06F 16/285 (2019.01); G06F 16/211 (2019.01); G06F 16/2246 (2019.01);
Abstract

The disclosed configurations provide a method (and/or a computer-readable medium or system) for determining, from a table schema describing keys of a data table, one or more clustering keys that can be used to cluster data files of a data table. The method includes generating features for the data table, generating tokens from the features, generating a prediction for each token by applying to the token a machine-learned transformer model trained to predict a likelihood that the key associated with the token is a clustering key for the data table, determining clustering keys based on the predictions, and clustering data records of the data table into data files based on key-values for the clustering keys.


Find Patent Forward Citations

Loading…