The patent badge is an abbreviated version of the USPTO patent document. The patent badge does contain a link to the full patent document.
The patent badge is an abbreviated version of the USPTO patent document. The patent badge covers the following: Patent number, Date patent was issued, Date patent was filed, Title of the patent, Applicant, Inventor, Assignee, Attorney firm, Primary examiner, Assistant examiner, CPCs, and Abstract. The patent badge does contain a link to the full patent document (in Adobe Acrobat format, aka pdf). To download or print any patent click here.
Patent No.:
Date of Patent:
Jul. 17, 2001
Filed:
Nov. 02, 1999
Marc Alexander Najork, Palo Alto, CA (US);
Clark Allan Heydon, San Francisco, CA (US);
Janet Lynn Wiener, Sunnyvale, CA (US);
Alta Vista Company, Palo Alto, CA (US);
Abstract
A web crawler downloads documents from among a plurality of host computers. The web crawler enqueues document addresses in a data structure called the Frontier. The Frontier generally includes a set of queues, with all document addresses sharing a respective common host component being stored in a respective common one of the queues. Multiple threads substantially concurrently process the document addresses in the queues. The Frontier includes a set of parallel “priority queues,” each associated with a distinct priority level. Queue elements for documents to be downloaded are assigned a priority level, and then stored in the corresponding priority queue. Queue elements are then distributed from the priority queues to a set of underlying queues in accordance with their relative priorities. The threads then process the queue elements in the underlying queues. When performing a continuous crawl, the web crawler reinserts the queue element for a downloaded document into the Frontier in accordance with a download priority level associated with the downloaded document. For example, the download priority level may be determined as a function of an expiration date and time associated with document whose document address is denoted by the queue element.