Hardware-and-software-based collective communication on the Quadrics network [electronic resource].
- Published:
- Washington, D.C. : United States. Dept. of Energy, 2001.
Oak Ridge, Tenn. : Distributed by the Office of Scientific and Technical Information, U.S. Dept. of Energy. - Physical Description:
- 18 pages : digital, PDF file
- Additional Creators:
- Los Alamos National Laboratory, United States. Department of Energy, and United States. Department of Energy. Office of Scientific and Technical Information
Access Online
- Restrictions on Access:
- Free-to-read Unrestricted online access
- Summary:
- The efficient implementation of collective communication patterns in a parallel machine is a challenging design effort, that requires the solution of many problems. In this paper we present an in-depth description of how the Quadrics network supports both hardware- and software-based collectives. We describe the main features of the two building blocks of this network, a network interface that can perform zero-copy user-level communication and a wormhole switch. We also focus our attention on the routing and $ow control algorithms, deadlock avoidance and on how the processing nodes are integrated in a global, virtual shared memory. Experimental results conducted on 64-node AlphaServer cluster indicate that the time to complete the hardware-based barrier synchronization on the whole network is as low as 6 ps, with veiy good scalability. Good latency and scalability are also achieved with the software-based synchronization, which takes about 15 ps. With the broadcast, similar performance is achieved by the hardware- and software-based implementations, which can deliver messages of up to 256 b,ytes in 13 ps and can get a sustained bandwidth of 288 Mbyteshec on all the nodes, with wressages larger than 64KB. The hardware-based barrier is almost insensitive to the network congestion, with 93% of the synchronizations taking less than 20 ps. On the other hand, the software based implementation suflers from a signif cant performance degradation. In high load environments the hardware broadcast maintains a reasonably good performance, delivering messages up to 2KB in 200 ps, while the software broadcast suffers from slightly higher latencies inherited by the synchronization mechanism.
- Report Numbers:
- E 1.99:la-ur-01-4692
la-ur-01-4692 - Subject(s):
- Other Subject(s):
- Note:
- Published through SciTech Connect.
01/01/2001.
"la-ur-01-4692"
Submitted to: NCA 2001, [IEEE International Symposium on Network Computing and Applications, October 2001, Boston].
Petrini, F.; Coll, S.; Frachtemberg, E.; Hoisie, A.
View MARC record | catkey: 14654628