The patent badge is an abbreviated version of the USPTO patent document. The patent badge does contain a link to the full patent document.

The patent badge is an abbreviated version of the USPTO patent document. The patent badge covers the following: Patent number, Date patent was issued, Date patent was filed, Title of the patent, Applicant, Inventor, Assignee, Attorney firm, Primary examiner, Assistant examiner, CPCs, and Abstract. The patent badge does contain a link to the full patent document (in Adobe Acrobat format, aka pdf). To download or print any patent click here.

Date of Patent:
May. 31, 2022

Filed:

Nov. 11, 2019
Applicant:

Salesforce.com, Inc., San Francisco, CA (US);

Inventors:

Ankit Chadha, Mountain View, CA (US);

Zeyuan Chen, Mountain View, CA (US);

Caiming Xiong, Menlo Park, CA (US);

Ran Xu, Palo Alto, CA (US);

Richard Socher, Menlo Park, CA (US);

Assignee:

salesforce.com, inc., San Francisco, CA (US);

Attorney:
Primary Examiner:
Int. Cl.
CPC ...
G06F 16/30 (2019.01); G06F 16/22 (2019.01); G06F 16/28 (2019.01);
U.S. Cl.
CPC ...
G06F 16/2282 (2019.01); G06F 16/285 (2019.01);
Abstract

Embodiments described herein provide unsupervised density-based clustering to infer table structure from document. Specifically, a number of words are identified from a block of text in an noneditable document, and the spatial coordinates of each word relative to the rectangular region are identified. Based on the word density of the rectangular region, the words are grouped into clusters using a heuristic radius search method. Words that are grouped into the same cluster are determined to be the element that belong to the same cell. In this way, the cells of the table structure can be identified. Once the cells are identified based on the word density of the block of text, the identified cells can be expanded horizontally or grouped vertically to identify rows or columns of the table structure.


Find Patent Forward Citations

Loading…