ScalaTrace [electronic resource] : Scalable Compression and Replay of Communication Traces for High Performance Computing
- Washington, D.C. : United States. Dept. of Energy, 2008.
Oak Ridge, Tenn. : Distributed by the Office of Scientific and Technical Information, U.S. Dept. of Energy.
- Physical Description:
- PDF-file: 31 pages; size: 0.2 Mbytes
- Additional Creators:
- Lawrence Berkeley National Laboratory, United States. Department of Energy, and United States. Department of Energy. Office of Scientific and Technical Information
- Restrictions on Access:
- Free-to-read Unrestricted online access
- Characterizing the communication behavior of large-scale applications is a difficult and costly task due to code/system complexity and long execution times. While many tools to study this behavior have been developed, these approaches either aggregate information in a lossy way through high-level statistics or produce huge trace files that are hard to handle. We contribute an approach that provides orders of magnitude smaller, if not near-constant size, communication traces regardless of the number of nodes while preserving structural information. We introduce intra- and inter-node compression techniques of MPI events that are capable of extracting an application's communication structure. We further present a replay mechanism for the traces generated by our approach and discuss results of our implementation for BlueGene/L. Given this novel capability, we discuss its impact on communication tuning and beyond. To the best of our knowledge, such a concise representation of MPI traces in a scalable manner combined with deterministic MPI call replay are without any precedent.
- Report Numbers:
- E 1.99:llnl-jrnl-403992
- Published through SciTech Connect.
Journal of Parallel and Distributed Computing, vol. 69, no. 8, August 1, 2009, pp. 696-710 69 8 FT
Schulz, M; Mueller, F; de Supinski, B R; Noeth, M; Ratn, P.
- Funding Information:
View MARC record | catkey: 14343550