The patent badge is an abbreviated version of the USPTO patent document. The patent badge does contain a link to the full patent document.

The patent badge is an abbreviated version of the USPTO patent document. The patent badge covers the following: Patent number, Date patent was issued, Date patent was filed, Title of the patent, Applicant, Inventor, Assignee, Attorney firm, Primary examiner, Assistant examiner, CPCs, and Abstract. The patent badge does contain a link to the full patent document (in Adobe Acrobat format, aka pdf). To download or print any patent click here.

Date of Patent:
Aug. 23, 2016

Filed:

Mar. 04, 2014
Applicant:

Sas Institute Inc., Cary, NC (US);

Inventors:

Patrick Hall, Chapel Hill, NC (US);

Ilknur Kaynar Kabul, Apex, NC (US);

Warren Sarle, Gainesville, FL (US);

Jorge Silva, Durham, NC (US);

Assignee:

SAS Institute Inc., Cary, NC (US);

Attorney:
Primary Examiner:
Int. Cl.
CPC ...
G06F 17/30 (2006.01); G06K 9/62 (2006.01);
U.S. Cl.
CPC ...
G06F 17/30598 (2013.01); G06K 9/6222 (2013.01);
Abstract

A method of determining a number of clusters for a dataset is provided. Centroid locations for a defined number of clusters are determined using a clustering algorithm. Boundaries for each of the defined clusters are defined. A reference distribution that includes a plurality of data points is created. The plurality of data points are within the defined boundary of at least one cluster of the defined clusters. Second centroid locations for the defined number of clusters are determined using the clustering algorithm and the reference distribution. A gap statistic for the defined number of clusters based on a comparison between a first residual sum of squares and a second residual sum of squares is computed. The processing is repeated for a next number of clusters to create. An estimated best number of clusters for the received data is determined by comparing the gap statistic computed for each iteration of the number of clusters.


Find Patent Forward Citations

Loading…