RAWS [electronic resource] : Collective interactions and data transfers
- Washington, D.C. : United States. Dept. of Energy, 2001.
Oak Ridge, Tenn. : Distributed by the Office of Scientific and Technical Information, U.S. Dept. of Energy.
- Physical Description:
- 10 pages : digital, PDF file
- Additional Creators:
- Los Alamos National Laboratory
United States. Department of Energy
United States. Department of Energy. Office of Scientific and Technical Information
- Most high performance scientific components or applications are implemented as parallel programs operating on physically or logically distributed data. As we consider the interaction between such components two major issues arise: (1) the definition of what exactly it means for two parallel components to interact, for example in terms of synchronization, and (2) how those components can most efficiently exchange the distributed data they operate on. Since both are common and important significant efforts have been expanded to implement them efficiently. Many of those efforts were, and still are, undertaken by applications developers (see [Cou99] for an example). Several attempts have been made to develop generic frameworks solving this problem; [FKKCSCi, KG97a, BFHM98, GKP971] have all addressed its aspects. Unfortunately, all of these solutions are limited to a set of applications that have fallen within the scope of experience of their developers, and therefore none of them have been fully successful in providing a general solution. Several factors influence the difficulty of producing a general solution. First, data redistribution depends on data representation which in applications is very often specific to an application. Therefore developing a standardized solution for distributed data transfer depends on developing a standardized data representation. Further, different systems assume different transfer logistics, such as timing of transfer, locking of data, and synchronization assumptions. Finally, the shape of abstractions in different systems depends on time and tolerance of different users. The Common Component Architecture (CCA) effort is promising with respect to addressing these challenges as it has already introduced a standardized system of interactions [AGG+99] and is in the process of defining standardized representations for distributed data. Furthermore, CCA builds on the sum of experiences of its participants. In this paper we summarize our most recent contributions to the CCA design process related to the interactions of parallel components, called collective components. We introduce the notion of a collectible port which is an extension of the CCA ports [AGG+99] and allows collective components to interact as one entity. This is a functionality not found in other existing standards of the day such as [OMG95, Ses97] and represents a significant extension of these standards. The usefulness and efficiency of similar abstractions has been shown in [KG97a, KG97b]. The abstraction described here, extends them in that it allows the programmer to define the performance/utility trade-off of his or her choice. We further describe a class of translation components, which translate between the distributed data format used by one parallel implementation, to that used by another. A well known example of such components is the MxN component which translates between data distributed on M processors to data distributed on N processors. We described its implementation in PAWS, and the supporting data structures. We also present a mechanism allowing the framework to invoke this component on the programmer's behalf whenever such translation is necessary freeing the programmer from treating collective component interactions as a special case. In doing that we introduce user-defined distributed type casts. Finally, we discuss our initial experiments in building complex translation components out of atomic functionalities. Since PAWS assumes a distributed memory model, our experiments are limited to dense rectilinear data. We describe a PAWS application to illustrate the results of this discussion.
- Published through SciTech Connect.
"Submitted to: High Performance Distributed Computing Conference, August 6-9, 2001, San Francisco, CA".
Mniszewski, S. M.; Fasel, P.K.; Keahey, K.
View MARC record | catkey: 14348499