MPI Performance Monitor on Windows

Chad Gregory

Parallel Processing Project

Fall 1998

Abstract

Parallel processing performance monitors and debuggers allow programmers to visualize what is happening on any node in their programs. A performance monitor is also needed for Brigham Young University's Windows Message Passing Interface (MPI). Monitors exist for PVM and the Intel Paragon systems. The MPI Performance Monitor (MPIPM) is the first generation monitor of the Windows MPI, and includes monitoring memory and processor usage. This data is shown in a Graphical User Interface (GUI).

Introduction

After the design and programming of an MPI program are complete, the user needs to debug and fine tune the mpi program. A common way to do this with multiple machines is to print out statements to standard out, which are redirected to the root node of an MPI program. This method is difficult to use because each of the nodes can send back information at the same instant, and the printout can get one message inside of another, the ordering of the messages is not guaranteed to be accurate, and it is not scalable in the least.

Such is the case with the Windows MPI developed by the Networked Computing Lab at Brigham Young University. Windows MPI was developed to allow supercomputer users to run their applications on Windows based systems. Built into the architecture was a "snooper" thread, or a thread that listened on a socket and could pass back information about the MPI program or system resources. I chose to use that thread to report back memory and processor usage whenever they were requested. By having these metrics, an inference could be made about which nodes were waiting, communicating, or processing.

Discussion of Related Work

Many universities have implemented parallel processing monitors. These include the XAB system at Carnegie Mellon, the XPVM system for the HP-UX system, and the Intel Paragon monitor.

XAB

The interface below shows the XAB system in action. (Carnegie Mellon)

The XAB system is a simple monitoring tool that records and displays event traces of pvm programs. It requires a recompilation of code in order to use the monitor. This is because macros are used to replace pvm calls. These macros add one or two events for every pvm call. The XAB system sends these extra messages to a process called the abmon, who in turn can pipe the events to a window or store them in a trace file. The XAB monitor process becomes the bottleneck for large applications, so the scalability is low. The XAB, as shown by the interface above, is largely a text interface. It stores events in the order they were received, which is a possibly inconsistent view of computation. (Carnegie Mellon)

XPVM

The XPVM is a graphical console and monitor for the Pardi Virtual Machine (PVM) for X Windows. It improved upon XAB in the interface and functionality. The XPVM allows console commands in pull down menus in the interface. Via the interface the program can turn tracing on or off, spawn tasks using a dialog box that prompts for all inputs, add hosts, delete hosts, and configure the virtual machine.

Perhaps the best feature of the xpvm is its interface and the data represented there. The interface shows several active views to monitor the execution of PVM programs. Interactions among tasks, network architecture, current state vs. time, and last event traced are all shown using clever diagrams from one console.

Instead of making macros to provide the extra messages and requiring a recompilation, the xpvm uses the pvm library version 3.3 and above to communicate with the nodes. No recompilation is necessary for enabling the debugging and performance tuning offered by XPVM.

The XPVM has a real time analysis or post-mortem playback from saved trace files. (Ludwig)

The following graphic shows what the xpvm interface looks like:

 

(Forstner)

Intel Paragon

The Intel Paragon machine was built around 1993 and included performance software. This software was called "Paragraph." The paragraph system could draw a picture of a cluster, or in its terms a "partition". Each partition could be used separately, so the Paragraph could verify that a specified partition is free or not. Everything else done by the Paragraph system was post-mortem.

Profiles, or output trace files, can be generated for each process running on the mesh. Paragraph could take those profiles and visualize the behavior and performance of Paragon applications. The Paragraph could summarize overall performance and show a "Variety of displays to visualize the performance of a parallel application." (Mionescu)

Discussion of MPI Performance Monitor

In order to use the MPI PM, you must follow these steps:

1) Code your MPI program using Windows NT and the BYU Windows MPI.

2) Start the MPI Monitor and input the root node's machine name in the edit field

3) Get the MPI applications running on that machine and choose one of those applications

4) Choose the "Monitor" button and when you are finished click on the close button.

Code overview:

 1) The MPI Monitor connects to the service on the root machine and requests the parallel programs running there.

Request Code

2) The MPIPM then duplicates each of the "snooper" sockets that the service maintains for the requested program.

Request Code

Service Code

3) The node monitors the system resources when a request is made for .400 seconds to get an accurate picture of the processor and memory usage. It then calculates the mean, maximum, and minimum values for the above metrics.

System Metrics code

4) The MPIPM then talks via the socket about the state of the node, and displays that state in a GUI.

Request Metrics and Display Code

Comparison

Name

Post-Mortem

Runtime

Compiled into library

MPI Performance Monitor

 

XAB

 

XPVM

Paragraph for Intel Paragon

 

 

After looking at the difference between my GUI and the others, I realized that I had very little MPI information available in the interface. I did have some neat features that no one else had, like memory usage and processor utilization. At the same time I didn't have communication information, network topology graphs, time plots of information, or logging support.

In the future, I would like to add network traffic information, multiple processes per machine synchronization so that the same machine isn't queried more than once for resource information, MPI communication statistics, and the ability to "nice" a machine. I would also like the MPI class access to this statistic for heuristic information. Currently I only support 1 through 16 nodes. This restraint is unacceptable.

Results

Overall Project

I am content that the program works completely and is running. I met all of the criteria I set forth in the project proposal, and met a stretch goal of reporting back processor utilization.

Performance of the Performance Monitor

I was concerned about how much monitoring the program would affect its performance. Although there is a performance hit whenever the system resource information is requested, the hit is minimal on the MPI nodes. They essentially average four longs and send two integers back to the MPI Performance Monitor. The biggest performance decrease happens on the machine where the MPI Monitor is running on, because of the graphical interface.

Another concern dealt with reentrancy. I wanted the code to support stopping and restarting the MPI Monitor on the same program without affecting the MPI program.

Bottleneck

In the project proposal, I said, "I think that this tool will be useful, or at least interesting. I surmise that there will be one or two nodes that will be a bottleneck for the rest of the program." This tool is indeed useful for monitoring the performance of each of the MPI nodes running an MPI application. I tried various programs and each of them could be improved somewhat. The algorithms used determined in large part could be rethought upon viewing the output of the MPI Performance Monitor

Conclusion

The MPI Performance Monitor was a success. It has a ways to go to reach the caliber of xpvm, but it is definitely on the right track.

References

1) European Synchrotron Radiation Facility

http://www.esrf.fr/computing/cs/nice/user/soft/xpvm.html

2) http://wwwbode.informatik.tu-muenchen.de/~wismuell/publications/sup95/node6.html

3) http://www.epm.ornl.gov/pvm/EuroPVM97/sld023.htm

4) http://www.netlib.org/utk/icl/xpvm/xpvm.html

5) http://www.cs.ucsb.edu/~mionescu/Courses/CS290I/paragon/index.html

6) http://www.cs.cmu.edu/afs/cs.cmu.edu/project/nectar-adamb/web/Xab.html