The patent badge is an abbreviated version of the USPTO patent document. The patent badge does contain a link to the full patent document.

The patent badge is an abbreviated version of the USPTO patent document. The patent badge covers the following: Patent number, Date patent was issued, Date patent was filed, Title of the patent, Applicant, Inventor, Assignee, Attorney firm, Primary examiner, Assistant examiner, CPCs, and Abstract. The patent badge does contain a link to the full patent document (in Adobe Acrobat format, aka pdf). To download or print any patent click here.

Date of Patent:
Apr. 11, 2006

Filed:

Jan. 18, 2001
Applicants:

Michael Burrows, Palo Alto, CA (US);

Keith H. Randall, Sunnyvale, CA (US);

Raymond P. Stata, Palo Alto, CA (US);

Rajiv G. Wickremesinghe, Durham, NC (US);

Inventors:

Michael Burrows, Palo Alto, CA (US);

Keith H. Randall, Sunnyvale, CA (US);

Raymond P. Stata, Palo Alto, CA (US);

Rajiv G. Wickremesinghe, Durham, NC (US);

Assignee:
Attorney:
Primary Examiner:
Assistant Examiner:
Int. Cl.
CPC ...
G06F 7/00 (2006.01);
U.S. Cl.
CPC ...
Abstract

A web crawler system includes a central processing unit for performing computations in accordance with stored procedures and a network interface for accessing remotely located computers via a network. A web crawler module downloads pages from remotely located servers via the network interface. A first link processing module obtains page link information from the downloaded page; the page link information includes for each downloaded page a row of page identifiers of other pages. A second link processing module encodes the rows of page identifies in a space efficient manner. It arranges the rows of page identifiers in a particular order. For each respective row it identifies a prior row, if any, that best matches the respective row in accordance with predefined row match criteria, determines a set of deletes representing page identifiers in the identified prior row not in the respective row, and determines a set of adds representing page identifiers in the respective row not in the identifier prior row. The second link processing module delta encodes the set of deletes and delta encodes the set of adds for each respective row, and then Huffman codes the delta encoded set of deletes and delta encoded set of adds for each respective row.


Find Patent Forward Citations

Loading…