The patent badge is an abbreviated version of the USPTO patent document. The patent badge does contain a link to the full patent document.

The patent badge is an abbreviated version of the USPTO patent document. The patent badge covers the following: Patent number, Date patent was issued, Date patent was filed, Title of the patent, Applicant, Inventor, Assignee, Attorney firm, Primary examiner, Assistant examiner, CPCs, and Abstract. The patent badge does contain a link to the full patent document (in Adobe Acrobat format, aka pdf). To download or print any patent click here.

Date of Patent:
Nov. 20, 2001

Filed:

Nov. 02, 1999
Applicant:
Inventors:

Marc Alexander Najork, Palo Alto, CA (US);

Clark Allan Heydon, San Francisco, CA (US);

Assignee:

AltaVista Company, Palo Alto, CA (US);

Attorney:
Primary Examiner:
Int. Cl.
CPC ...
G06F 1/300 ;
U.S. Cl.
CPC ...
G06F 1/300 ;
Abstract

A web crawler downloads data sets from among a plurality of host computers. The web crawler enqueues data set addresses in a set of queues, with all data set addresses sharing a respective common host address being stored in a respective common one of the queues. Each non-empty queue is assigned a next download time. Multiple threads substantially concurrently process the data set addresses in the queues. The number of queues is at least as great as the number of threads, and the threads are dynamically assigned to the queues. In particular, each thread selects a queue not being serviced by any of the other threads. The queue is selected in accordance with the next download times assigned to the queues. The data set corresponding to a data set address in the selected queue is downloaded and processed, and the data set address is dequeued from the selected queue. When the selected queue is not empty after the dequeuing step, it is assigned an updated download time. Then the thread deselects the selected queue, and the process of selecting a queue and processing a data set repeats. The next download time assigned to each queue is preferably a function of the length of time it took to download a previous document whose address was stored in the queue. For instance, the next download time may be set equal to the current time plus the a scaling constant multiplied by the download time of the previous document.


Find Patent Forward Citations

Loading…