Scaling Up Data-Centric Middleware on a Cluster Computer [electronic resource].
- Washington, D.C. : United States. Dept. of Energy, 2005.
Oak Ridge, Tenn. : Distributed by the Office of Scientific and Technical Information, U.S. Dept. of Energy.
- Physical Description:
- PDF-file: 14 pages; size: 0.3 Mbytes
- Additional Creators:
- Lawrence Berkeley National Laboratory
United States. Department of Energy
United States. Department of Energy. Office of Scientific and Technical Information
- Data-centric workflow middleware systems are workflow systems that treat data as first class objects alongside programs. These systems improve the usability, responsiveness and efficiency of workflow execution over cluster (and grid) computers. In this work, we explore the scalability of one such system, GridDB, on cluster computers. We measure the performance and scalability of GridDB in executing data-intensive image processing workflows from the SuperMACHO astrophysics survey on a large cluster computer. Our first experimental study concerns the scale-up of GridDB. We make a rather surprising finding, that while the middleware system issues many queries and transactions to a DBMS, file system operations present the first-tier bottleneck. We circumvent this bottleneck and increase the scalability of GridDB by more than 2-fold on our image processing application (up to 128 nodes). In a second study, we demonstrate the sensitivity of GridDB performance (and therefore application performance) to characteristics of the workflows being executed. To manage these sensitivities, we provide guidelines for trading off the costs and benefits of GridDB at a fine-grain.
- Published through SciTech Connect.
Presented at: Supercomputing 05, Seattle, WA, United States, Nov 12 - Nov 18, 2005.
Liu, D T; Franklin, M J; Abdulla, G M; Garlick, J.
- Funding Information:
View MARC record | catkey: 13823909