The patent badge is an abbreviated version of the USPTO patent document. The patent badge does contain a link to the full patent document.
The patent badge is an abbreviated version of the USPTO patent document. The patent badge covers the following: Patent number, Date patent was issued, Date patent was filed, Title of the patent, Applicant, Inventor, Assignee, Attorney firm, Primary examiner, Assistant examiner, CPCs, and Abstract. The patent badge does contain a link to the full patent document (in Adobe Acrobat format, aka pdf). To download or print any patent click here.
Patent No.:
Date of Patent:
Dec. 09, 2014
Filed:
Dec. 09, 2011
Saha Ankan, Chicago, IL (US);
Arindam Banerjee, Roseville, MN (US);
Shiva P. Kasiviswanathan, White Plains, NY (US);
Richard D. Lawrence, Ridgefield, CT (US);
Prem Melville, New York, NY (US);
Vikas Sindhwani, Hawthorne, NY (US);
Edison L. Ting, San Jose, CA (US);
Saha Ankan, Chicago, IL (US);
Arindam Banerjee, Roseville, MN (US);
Shiva P. Kasiviswanathan, White Plains, NY (US);
Richard D. Lawrence, Ridgefield, CT (US);
Prem Melville, New York, NY (US);
Vikas Sindhwani, Hawthorne, NY (US);
Edison L. Ting, San Jose, CA (US);
International Business Machines Corporation, Armonk, NY (US);
Abstract
A method, system and computer program product for inferring topic evolution and emergence in a set of documents. In one embodiment, the method comprises forming a group of matrices using text in the documents, and analyzing these matrices to identify a first group of topics as evolving topics and a second group of topics as emerging topics. The matrices includes a first matrix X identifying a multitude of words in each of the documents, a second matrix W identifying a multitude of topics in each of the documents, and a third matrix H identifying a multitude of words for each of the multitude of topics. These matrices are analyzed to identify the evolving and emerging topics. In an embodiment, the documents form a streaming dataset, and two forms of temporal regularizers are used to help identify the evolving topics and the emerging topics in the streaming dataset.