The patent badge is an abbreviated version of the USPTO patent document. The patent badge does contain a link to the full patent document.
The patent badge is an abbreviated version of the USPTO patent document. The patent badge covers the following: Patent number, Date patent was issued, Date patent was filed, Title of the patent, Applicant, Inventor, Assignee, Attorney firm, Primary examiner, Assistant examiner, CPCs, and Abstract. The patent badge does contain a link to the full patent document (in Adobe Acrobat format, aka pdf). To download or print any patent click here.
Patent No.:
Date of Patent:
Jan. 26, 2016
Filed:
Dec. 16, 2011
Gautam Das, Irving, TX (US);
Hao H. Huang, McLean, VA (US);
Sandor Szalay, Baltimore, MD (US);
Nan Zhang, Fairfax, VA (US);
Gautam Das, Irving, TX (US);
Hao H. Huang, McLean, VA (US);
Sandor Szalay, Baltimore, MD (US);
Nan Zhang, Fairfax, VA (US);
The George Washington University, Washington, DC (US);
Board of Regents, The University of Texas System, Austin, TX (US);
The Johns Hopkins University, Baltimore, MD (US);
Abstract
As file systems reach the petabytes scale, users and administrators are increasingly interested in acquiring high-level analytical information for file management and analysis. Two particularly important tasks are the processing of aggregate and top-k queries which, unfortunately, cannot be quickly answered by hierarchical file systems such as ext3 and NTFS. Existing pre-processing based solutions, e.g., file system crawling and index building, consume a significant amount of time and space (for generating and maintaining the indexes) which in many cases cannot be justified by the infrequent usage of such solutions. User interests can often be sufficiently satisfied by approximate (i.e., statistically accurate) answers. A just-in-time sampling-based system can, after consuming a small number of disk accesses, produce extremely accurate answers for a broad class of aggregate and top-k queries over a file system without the requirement of any prior knowledge. The system is efficient, accurate and scalable.