The patent badge is an abbreviated version of the USPTO patent document. The patent badge does contain a link to the full patent document.
The patent badge is an abbreviated version of the USPTO patent document. The patent badge covers the following: Patent number, Date patent was issued, Date patent was filed, Title of the patent, Applicant, Inventor, Assignee, Attorney firm, Primary examiner, Assistant examiner, CPCs, and Abstract. The patent badge does contain a link to the full patent document (in Adobe Acrobat format, aka pdf). To download or print any patent click here.
Patent No.:
Date of Patent:
Jun. 03, 2008
Filed:
Apr. 14, 2005
Gheorghe Almasi, Ardsley, NY (US);
Matthias Augustin Blumrich, Ridgefield, CT (US);
Dong Chen, Croton-On-Hudson, NY (US);
Paul Coteus, Yorktown, NY (US);
Alan Gara, Mount Kisco, NY (US);
Mark E. Giampapa, Irvington, NY (US);
Philip Heidelberger, Cortlandt Manor, NY (US);
Dirk I. Hoenicke, Ossining, NY (US);
Sarabjeet Singh, Mississauga, CA;
Burkhard D. Steinmacher-burow, Wernau, DE;
Todd Takken, Brewster, NY (US);
Pavlos Vranas, Bedford Hills, NY (US);
Gheorghe Almasi, Ardsley, NY (US);
Matthias Augustin Blumrich, Ridgefield, CT (US);
Dong Chen, Croton-On-Hudson, NY (US);
Paul Coteus, Yorktown, NY (US);
Alan Gara, Mount Kisco, NY (US);
Mark E. Giampapa, Irvington, NY (US);
Philip Heidelberger, Cortlandt Manor, NY (US);
Dirk I. Hoenicke, Ossining, NY (US);
Sarabjeet Singh, Mississauga, CA;
Burkhard D. Steinmacher-Burow, Wernau, DE;
Todd Takken, Brewster, NY (US);
Pavlos Vranas, Bedford Hills, NY (US);
International Business Machines Corporation, Armonk, NY (US);
Abstract
Methods and apparatus perform fault isolation in multiple node computing systems using commutative error detection values for—example, checksums—to identify and to isolate faulty nodes. When information associated with a reproducible portion of a computer program is injected into a network by a node, a commutative error detection value is calculated. At intervals, node fault detection apparatus associated with the multiple node computer system retrieve commutative error detection values associated with the node and stores them in memory. When the computer program is executed again by the multiple node computer system, new commutative error detection values are created and stored in memory. The node fault detection apparatus identifies faulty nodes by comparing commutative error detection values associated with reproducible portions of the application program generated by a particular node from different runs of the application program. Differences in values indicate a possible faulty node.