Scalable Node Monitoring [electronic resource].
- Published:
- Los Alamos, N.M. : Los Alamos National Laboratory, 2012.
Oak Ridge, Tenn. : Distributed by the Office of Scientific and Technical Information, U.S. Dept. of Energy. - Additional Creators:
- Los Alamos National Laboratory and United States. Department of Energy. Office of Scientific and Technical Information
Access Online
- Restrictions on Access:
- Free-to-read Unrestricted online access
- Summary:
- Project description is: (1) Build a high performance computer; and (2) Create a tool to monitor node applications in Component Based Tool Framework (CBTF) using code from Lightweight Data Metric Service (LDMS). The importance of this project is that: (1) there is a need a scalable, parallel tool to monitor nodes on clusters; and (2) New LDMS plugins need to be able to be easily added to tool. CBTF stands for Component Based Tool Framework. It's scalable and adjusts to different topologies automatically. It uses MRNet (Multicast/Reduction Network) mechanism for information transport. CBTF is flexible and general enough to be used for any tool that needs to do a task on many nodes. Its components are reusable and 'EASILY' added to a new tool. There are three levels of CBTF: (1) frontend node - interacts with users; (2) filter nodes - filters or concatenates information from backend nodes; and (3) backend nodes - where the actual work of the tool is done. LDMS stands for lightweight data metric servies. It's a tool used for monitoring nodes. Ltool is the name of the tool we derived from LDMS. It's dynamically linked and includes the following components: Vmstat, Meminfo, Procinterrupts and more. It works by: Ltool command is run on the frontend node; Ltool collects information from the backend nodes; backend nodes send information to the filter nodes; and filter nodes concatenate information and send to a database on the front end node. Ltool is a useful tool when it comes to monitoring nodes on a cluster because the overhead involved with running the tool is not particularly high and it will automatically scale to any size cluster.
- Report Numbers:
- E 1.99:la-ur-12-23629
la-ur-12-23629 - Subject(s):
- Other Subject(s):
- Note:
- Published through SciTech Connect.
07/30/2012.
"la-ur-12-23629"
Computing and Information Technology Student Mini Showcase ; 2012-08-02 - 2012-08-02 ; Los Alamos, New Mexico, United States.
Drotar, Alexander P.; Quinn, Erin E.; Sutherland, Landon D. - Funding Information:
- AC52-06NA25396
View MARC record | catkey: 14342793