____________

Incorporation of fault tolerance

____________

We incorporate the capability of fault tolerance into the task management system as follows. We achieve fault tolerance in module allocation by identifying critical modules whose completion is critical to the timely completion of the task system, replicating them, and allocating replicated modules to distinct processing nodes. Three issues need to be considered:


We determine (1) via critical path analysis; (2) by striking a balance between the degree of fault tolerance and the system capacity; and (3) by coupling message scheduling with module allocation.

We achieve fault tolerance in load sharing by

To exploit checkpointing and rollback recovery techniques, we develop and implement an effective application-transparent checkpointing/rollback scheme.

The research results have been reported in the following papers:


Return to Project Home Page
Date last modified -- August 15, 1998
Direct comments concerning this WWW site to: jhou@ece.osu.edu