The patent badge is an abbreviated version of the USPTO patent document. The patent badge does contain a link to the full patent document.
The patent badge is an abbreviated version of the USPTO patent document. The patent badge covers the following: Patent number, Date patent was issued, Date patent was filed, Title of the patent, Applicant, Inventor, Assignee, Attorney firm, Primary examiner, Assistant examiner, CPCs, and Abstract. The patent badge does contain a link to the full patent document (in Adobe Acrobat format, aka pdf). To download or print any patent click here.
Patent No.:
Date of Patent:
Oct. 18, 2011
Filed:
Jun. 30, 2004
Huican Zhu, San Jose, CA (US);
Maximilian Ibel, Pfaeffikon, CH;
Anurag Acharya, Campbell, CA (US);
Howard Bradley Gobioff, San Francisco, CA (US);
Huican Zhu, San Jose, CA (US);
Maximilian Ibel, Pfaeffikon, CH;
Anurag Acharya, Campbell, CA (US);
Howard Bradley Gobioff, San Francisco, CA (US);
Google Inc., Mountain View, CA (US);
Abstract
A search engine crawler includes a distributed set of schedulers that are associated with one or more segments of document identifiers (e.g., URLs) corresponding to documents on a network (e.g., WWW). Each scheduler handles the scheduling of document identifiers (for crawling) for a subset of the known document identifiers. Using a starting set of document identifiers, such as the document identifiers crawled (or scheduled for crawling) during the most recent completed crawl, the scheduler removes from the starting set those document identifiers that have been unreachable in each of the last X crawls. Other filtering mechanisms may also be used to filter out some of the document identifiers in the starting set. The resulting list of document identifiers is written to a scheduled output file for use in a next crawl cycle.