The patent badge is an abbreviated version of the USPTO patent document. The patent badge does contain a link to the full patent document.
The patent badge is an abbreviated version of the USPTO patent document. The patent badge covers the following: Patent number, Date patent was issued, Date patent was filed, Title of the patent, Applicant, Inventor, Assignee, Attorney firm, Primary examiner, Assistant examiner, CPCs, and Abstract. The patent badge does contain a link to the full patent document (in Adobe Acrobat format, aka pdf). To download or print any patent click here.
Patent No.:
Date of Patent:
Oct. 31, 2006
Filed:
Mar. 22, 2002
Ioannis Tsochantaridis, Providence, RI (US);
Thorsten H. Brants, Palo Alto, CA (US);
Francine R. Chen, Menlo Park, CA (US);
Ioannis Tsochantaridis, Providence, RI (US);
Thorsten H. Brants, Palo Alto, CA (US);
Francine R. Chen, Menlo Park, CA (US);
Xerox Corporation, Stamford, CT (US);
Abstract
Systems and methods for determining the topic structure of a document including text utilize a Probabilistic Latent Semantic Analysis (PLSA) model and select segmentation points based on similarity values between pairs of adjacent text blocks. PLSA forms a framework for both text segmentation and topic identification. The use of PLSA provides an improved representation for the sparse information in a text block, such as a sentence or a sequence of sentences. Topic characterization of each text segment is derived from PLSA parameters that relate words to 'topics', latent variables in the PLSA model, and 'topics' to text segments. A system executing the method exhibits significant performance improvement. Once determined, the topic structure of a document may be employed for document retrieval and/or document summarization.