The patent badge is an abbreviated version of the USPTO patent document. The patent badge does contain a link to the full patent document.
The patent badge is an abbreviated version of the USPTO patent document. The patent badge covers the following: Patent number, Date patent was issued, Date patent was filed, Title of the patent, Applicant, Inventor, Assignee, Attorney firm, Primary examiner, Assistant examiner, CPCs, and Abstract. The patent badge does contain a link to the full patent document (in Adobe Acrobat format, aka pdf). To download or print any patent click here.
Patent No.:
Date of Patent:
Mar. 09, 2010
Filed:
Dec. 31, 2003
Andrew S. Laucius, Seattle, WA (US);
Darren A. Shakib, North Bend, WA (US);
Eytan D. Seidman, Seattle, WA (US);
Jonathan Forbes, Bellevue, WA (US);
Keith A. Birney, Redmond, WA (US);
Andrew S. Laucius, Seattle, WA (US);
Darren A. Shakib, North Bend, WA (US);
Eytan D. Seidman, Seattle, WA (US);
Jonathan Forbes, Bellevue, WA (US);
Keith A. Birney, Redmond, WA (US);
Microsoft Corporation, Redmond, WA (US);
Abstract
A system and method facilitating incremental web crawl(s) using chunk(s) is provided. The system can be employed, for example, to facilitate a web-crawling system that crawls (e.g., continuously) the Internet for information (e.g., data) and indexes the information so that it can be used as part of a web search engine. The system facilitates incremental re-crawls and/or selective updating of information (e.g., documents) using a structure called a chunk to simplify the process of an incremental crawl. A chunk is a set of documents that can be manipulated as a set (e.g., of up to 65,536 (64K) documents). 'Document' refers to a corpus of data that is stored at a particular URL (e.g., HTML, PDF, PS, PPT, XLS, and/or DOC Files etc.) A chunk is created by an indexer. The indexer can place into a chunk documents that have similar property(ies). These property(ies) include but are not limited to: average time between change and average importance. These property(ies) can be stored at the chunk level in a chunk map. The chunk map can then be employed (e.g., on a daily basis) to determine which chunk(s) should be re-crawled.