Actions for Overlapped checkpointing with hardware assist [electronic resource].
Overlapped checkpointing with hardware assist [electronic resource].
- Published
- Washington, D.C. : United States. Dept. of Energy, 2009.
Oak Ridge, Tenn. : Distributed by the Office of Scientific and Technical Information, U.S. Dept. of Energy. - Additional Creators
- Los Alamos National Laboratory, United States. Department of Energy, and United States. Department of Energy. Office of Scientific and Technical Information
Access Online
- Restrictions on Access
- Free-to-read Unrestricted online access
- Summary
- We present a new approach to handling the demanding I/O workload incurred during checkpoint writes encountered in High Performance Computing. Prior efforts to improve performance have been primarily bound by mechanical limitations of the hard drive. Our research surpasses this limitation by providing a method to: (1) write checkpoint data to a high-speed, non-volatile buffer, and (2) asynchronously write this data to permanent storage while resuming computation. This removes the hard drive from the critical data path because our I/O node based buffers isolate the compute nodes from the storage servers. This solution is feasible because of industry declines in cost for high-capacity, non-volatile storage technologies. Testing was conducted on a small-scale cluster to prove the design, and then scaled at Los Alamos National Laboratory. Results show a definitive speedup factor for select workloads over writing directly to a typical global parallel file system; the Panasas ActiveScale File System.
- Report Numbers
- E 1.99:la-ur-09-04995
E 1.99: la-ur-09-4995
la-ur-09-4995
la-ur-09-04995 - Other Subject(s)
- Note
- Published through SciTech Connect.
01/01/2009.
"la-ur-09-04995"
" la-ur-09-4995"
IEEE Cluster 2009 - IASDS Workshop ; September 4, 2009 ; New Orleans, LA.
Wang, Jun; Mitchell, Christopher J; Nunez, James A. - Funding Information
- AC52-06NA25396
View MARC record | catkey: 14653943