The patent badge is an abbreviated version of the USPTO patent document. The patent badge does contain a link to the full patent document.
The patent badge is an abbreviated version of the USPTO patent document. The patent badge covers the following: Patent number, Date patent was issued, Date patent was filed, Title of the patent, Applicant, Inventor, Assignee, Attorney firm, Primary examiner, Assistant examiner, CPCs, and Abstract. The patent badge does contain a link to the full patent document (in Adobe Acrobat format, aka pdf). To download or print any patent click here.
Patent No.:
Date of Patent:
Aug. 17, 1999
Filed:
Apr. 03, 1998
AT & T Corp., Middletown, NJ (US);
Abstract
A fault tolerant message passing system includes a plurality of interconnected processors with storage and a watchdog process wherein the processors may undergo failure. A method restores a consistent system state using optimistic logging protocol with asynchronous recovery. Each process comprises a sequence of state intervals and includes checkpoints for saving in storage the state of the process sufficient to re-start execution of the process. Non-deterministic event messages are logged in storage by each process for replay after process re-start to reconstruct pre-failure state intervals. Transitive dependency tracking of messages and process states is performed to record the highest-index state interval of each process upon which a local process depends. A variable size dependency vector is attached to each outgoing message sent between processes. An integer K is assigned to each outgoing message as the upper bound on the vector size. The vector for the local process is updated upon receiving each incoming message. A process failure is detected and the failed process is re-started. The latest checkpoint is restored and the logged messages are replayed. A new incarnation of the failed process is started and identified by P.sub.i, t where (i) is the process number and (t) is the incarnation number, each state interval being identified by (t,x).sub.i where (x) is the state interval number. A failure announcement is broadcast to the other processes, the announcement containing (t,x).sub.i where (x) is the state interval number of the last recreatable state interval of the failed process incarnation P.sub.i, t. Upon receiving a failure announcement containing (t,x).sub.i, the entry for process (i) is extracted from the local dependency vector. The entry for process (i) is compared to the (t,x).sub.i contained in the failure announcement. The process is classified as orphaned from the comparison if the process depends upon a higher-index state interval than (t,x).sub.i. A process roll-back is performed to reconstruct only non-orphaned state intervals in the rolled-back process.