Dreese Lab

A Myrinet Testbed for Research in Distributed Real-Time Environments

NSF

Faculty: Chao-Ju Jennifer Hou, Ching-Chih Jason Han, Yuan F. Zheng, and Fusun Ozguner

Department: Electrical Engineering at the Ohio State University

Sponsor: NSF Instrumentation Grant for Research

The Department of Electrical Engineering at The Ohio State University requests funds to purchase state-of-the-art equipment, including (1) nine Sun SPARCstations and host interfaces, (2) one 8-port and seven 4-port Myrinet switches, and (3) two software licenses for the OpenInvention visualization package to support distributed real-time system research in Computer and Information Science and Engineering. The equipment has been used for five research projects, including

1. Design and implementation of a load sharing mechanism in distributed real-time environments.

In a heterogeneous distributed real-time environment, uneven task arrivals temporarily overload some nodes while leaving others idle or underloaded. Consequently, some tasks may miss their deadlines even if the overall system has the capacity to meet the deadlines of all tasks. We propose to design, implement, and evaluate a load sharing (LS) mechanism to enable ``capable'' nodes to share the workload of ``incapable'' ones and to maximize the probability of tasks meeting their deadlines in distributed real-time environments. In particular, we focus on:

  1. design of an effective LS scheme by carefully devising the transfer policy, the location policy, and the information policy, to tackle the problems identified, and

  2. implementation of the LS scheme as an experimental software prototype that lies between the operating system and the application programs on each node in the Myrinet testbed.

2. Application of distance-constrained task model to message scheduling in distributed real-time environments.

The problem of guaranteeing the timely delivery of messages has been drawing considerable attention, especially in the context of transmitting voice/video data over a data network, and communicating control/status information in embedded real-time systems. The proposed research is to establish a formal basis for providing deterministic timing guarantee through runtime message scheduling. We employ the (C,D)-smooth message model to characterize traffic with timing requirements, and develop, based on the distance-constrained task system (DCTS) model, an effective message scheduling scheme to transmit messages with end-to-end delay bounds and delay jitter bounds.

We have implemented the proposed message scheduling scheme in the Myrinet device driver on each node in the testbed and demonstrated the effectiveness of the proposed scheme through empirical experimentation and measurement.

3. Design and implementation of a fast path restoration mechanism in distributed real-time environments.

In this project, we propose, for distributed environments, a fast path restoration mechanism that exploits the virtual path/virtual circuit (VP/VC) concept (that was first proposed in the ATM forum). Given a distributed system topology, the capacity of each physical link, and the primary virtual path (VP) layout at system initialization, the proposed mechanism pre-assigns, with as few resources used as possible, to each primary VP one backup VP such that the failure of a single node/link does not lead to the failure of both the primary and backup VPs.

Upon physical node/link failure, the proposed mechanism redirects messages on the failed VPs to their corresponding backup VPs and locates new routes, in a decentralized manner, for injured backup VPs and second-generation backup VPs.

We have devised all the component algorithms, analyze via analytical modeling and simulation the primary overhead incurred, are configuring and implementing the solution algorithms as software daemons that reside at each node, and will empirically measure its performance.

4. Design and implementation of multisensor integration algorithms that take advantage of load sharing and real-time communication schemes.

Sensors are an important part of intelligent systems such as autonomous robots, computer-aided manufacturing systems, intelligent vehicle and highway systems, just to name a few. Because numerous sensors are involved for robust comprehending of complex environments, multisensor integration is essential for these systems. We propose to use a local area network as an effective approach for communication during the integration process. Since integration must be accomplished in real-time, distributed real-time computing becomes an important issue of research. We have studied multisensor integration algorithms that take advantage of the load sharing scheme and the real-time communication protocol, and will test new mechanisms on the proposed Myrinet local area network.

5. Design and implementation of matching and scheduling algorithms that allocate application tasks in distributed heterogeneous environments.

In heterogeneous distributed computing, a network of dissimilar machines is used to execute applications in parallel. With the availability of high performance networking hardware, like Myrinet, it is practical to implement heterogeneous distributed systems where resources are shared by multiple applications. This type of execution environment is highly dynamic, which can cause uncertainty in the matching and scheduling process and make the effective utilization of such an environment challenging. We propose to employ probabilistic methods to solve the matching and scheduling problem. We are implementing the proposed methods on the Myrinet testbed, and will empirically evaluate them using real applications.

All five projects are composed of two synergistic parts: development of effective task management schemes, communication protocols, or multisensor integration algorithms based on a rigorous analytical foundation, and their validation with software system building and experiments. The experimentation is essential since a true evaluation of system performance can only be obtained through implementation and direct measurement. The research instrumentation allows the construction of an inexpensive distributed real-time environment to conduct the work described.


Return to Project Home Page
Date last modified -- July 30, 1997
Direct comments concerning this WWW site to: jhou@ece.osu.edu