The patent badge is an abbreviated version of the USPTO patent document. The patent badge does contain a link to the full patent document.

The patent badge is an abbreviated version of the USPTO patent document. The patent badge covers the following: Patent number, Date patent was issued, Date patent was filed, Title of the patent, Applicant, Inventor, Assignee, Attorney firm, Primary examiner, Assistant examiner, CPCs, and Abstract. The patent badge does contain a link to the full patent document (in Adobe Acrobat format, aka pdf). To download or print any patent click here.

Patent No.:

US 11023710 B1

Date of Patent:

Jun. 01, 2021

Filed:

Feb. 20, 2019

Semi-supervised hybrid clustering/classification system

Applicants:

Peng Dai, Markham, CA;

Juwei LU, North York, CA;

Bharath Sekar, North York, CA;

Wei LI, Markham, CA;

Jianpeng Xu, Markham, CA;

Ruiwen LI, Markham, CA;

Inventors:

Peng Dai, Markham, CA;

Juwei Lu, North York, CA;

Bharath Sekar, North York, CA;

Wei Li, Markham, CA;

Jianpeng Xu, Markham, CA;

Ruiwen Li, Markham, CA;

Assignee:

HUAWEI TECHNOLOGIES CO., LTD., Shenzhen, CN;

Attorney:

Primary Examiner:

Sheela C Chawan

Int. Cl.

CPC ...

G06K 9/00 (2006.01); G06K 9/62 (2006.01); G06N 20/00 (2019.01); G06F 16/75 (2019.01);

U.S. Cl.

CPC ...

G06K 9/00288 (2013.01); G06F 16/75 (2019.01); G06K 9/6218 (2013.01); G06K 9/6257 (2013.01); G06K 9/6267 (2013.01); G06N 20/00 (2019.01);

Abstract

System and method for classifying data objects occurring in an unstructured dataset, comprising: extracting feature vectors from the unstructured dataset, each feature vector representing an occurrence of a data object in the unstructured dataset; classifying the feature vectors into feature vector sets that each correspond to a respective object class from a plurality of object classes; for each feature vector set: performing multiple iterations of a clustering operation, each iteration including clustering feature vectors from the feature vector set into clusters of similar feature vectors and identifying outlier feature vectors, wherein for at least one iteration after a first iteration of the clustering operation, outlier feature vectors identified in a previous iteration are excluded from the clustering operation; and outputting a key cluster for the feature vector set from a final iteration of the multiple iterations, the key cluster including a greater number of similar feature vectors than any of the other clusters of the final iteration; and assembling a dataset that includes at least the feature vectors from the key clusters of the feature vector sets.

Find Patent Forward Citations