The goal of ASKALON is to simplify the development and optimization of applications that can harness the power of Grid computing. The ASKALON project crafts a novel environment based on new innovative tools, services, and methodologies to make Grid application development and optimization for real applications an everyday practice. For more information refer to ASKALON.
ASKALON started up as a tool set fo Cluster and Grid Computing before it has been extended to a fully fledged development environment for the Grid. Distributed and parallel computing have been investigated for many years but recently research on this topic has gained new impetus due to the explosive growth of the Internet, the occurrence of cluster and Grid computing infrastructures, and the availability of portable programming languages such as Java. Although rapid advances in microprocessor technology, high speed networks and heterogeneous distributed computing architectures are bringing teraflops performance within grasp, the software infrastructure for distributed and parallel systems has not kept pace. Substantial advances in the field of programming languages and methods enable the programmer to write effective programs at a machine-independent level. However, as parallelization and optimization of programs is far from being automated, there is a clear need for useful, efficient and accurate tools to support this process. The following tools are being developed within the framework of ASKALON, a programming environment and tool set for cluster and grid computing
- SCALEA: a performance instrumentation, tracing, and profiling tool
- AKSUM: an automatic performance bottleneck analysis tool
- Performance Prophet: a modeling and performance prediction system
- ZENTURIO: an automatic performance experiment management system
I developed the Weight Finder, an advanced and highly optimized profiler for F77 programs. The purpose of the Weight Finder is to find the computational intensive code portions by runtime profiling. Furthermore, the Weight Finder obtains concrete values for loop iteration counts, branching probabilities, and statement execution counts. The Weight Finder is an integrated tool of the Vienna Fortran Compilation System. It allows optional instrumentation for different profile data. The distortion of profile data is minimized by reducing the instrumentation runtime and memory overhead. This is achieved by applying several powerful optimized instrumentation techniques.Moreover, I am supervising the Scala project . Lack of effective performance analysis environments is a major barrier to the broader use of distributed and parallel computing. Many existing performance tools collect and visualize performance data for programs that have been generated and transformed by a compiler. It is very difficult for the programmer to meaningfully relate such performance data back to the input program. We developed a novel framework for a portable instrumentation in order to selectively monitor the performance of distributed and parallel programs. Among others, a rich set of array information can be collected. Code transformations are recorded in order to maintain the relationship between collected performance data of a compiler-generated code and the input program. Performance overhead introduced by a transformation systems can be separately measured and displayed in relation to the performance behavior of the input program. Many performance metrics and statistics are computed. Performance data can be filtered, summarized, and analzyed at various levels of detail.
I am continuously developing P3T, a state-of-the art performance estimator for High Performance Fortran programs that assists both programmers and parallelizing compilers in the search for efficient data distribution strategies and profitable program transformations. It detects performance bottlenecks in the program, identifies the cause of performance problems, and relates it back to the source code. Four of the most critical performance aspects of parallel programs are estimated: load balance, cache locality, communication and computation overhead. The P3T is an integrated tool of the Vienna Fortran Compilation System, which enables the estimator to aggressively exploit considerable knowledge about the compiler's analysis information and code restructuring strategies. The P3T's graphical user interface directs the user to bottlenecks in a computation that prevent the program from performing well. An advanced graphical user interface allows to filter and visualize performance data at various levels of detail.Most recently my research has focused on extending the P3T by symbolic analysis, adding more performance parameters to it, and porting the P3T on other distributed memory architectures (e.g. Meiko CS-2, Cenju 4, Network of Workstations, Beowulf cluster, etc.). This work is done as part of the Aurora Tools Project.
I am continuosly developing JavaSymphony, a programming paradigm for locality-oriented distributed and parallel Java applications.Most Java-based systems that support portable parallel and distributed computing either require the programmer to deal with intricate low-level details of Java which can be a tedious, time-consuming and error-prone task, or prevent the programmer from controlling locality of data. In contrast, JavaSymphony -- a programming paradigm for distributed and parallel computing -- provides a software infrastructure for wide classes of heterogeneous systems ranging from small-scale cluster computing to large scale wide-area meta-computing. The software infrastructure of JavaSymphony is written entirely in Java and runs on any standard compliant Java virtual machine.
In contrast to most existing systems, JavaSymphony provides the programmer with the flexibility to control data locality and load balancing by explicit mapping of objects to computing nodes. Virtual architectures are specified to impose a virtual hierarchy on a distributed system of physical computing nodes. Objects can be mapped and dynamically migrated to arbitrary components of virtual architectures. A high-level API to hardware/software system parameters is provided to control mapping, migration, and load balancing of objects. Objects can interact through synchronous, asynchronous and one-sided method invocation. Selective remote classloading may reduce the overall memory requirement of an application. Moreover, objects can be made persistent by explicitly storing and loading objects to/from external storage.
A prototype of the JavaSymphony software infrastructure has been implemented. Preliminary experiments on a heterogeneous cluster of workstations are described that demonstrate reasonable performance values for a small test program.
For 2 years I am workpackage leader in the APART Esprit IV Working Group on {\sl Automatic Performance Analysis: Resources and Tools} which focuses on automatic performance modeling and analysis. We have developed ASL (APART Performance Property Specification Language), which is a language to describe performance problems for distributed and parallel systems. This language allows the description of performance-related data through the provision of an object-oriented specification model and supports definition of performance properties in a novel formal notation. Performance-related data can either be static (gathered at compile-time, e.g. code regions, control and data flow information, predicted performance data, etc.) or dynamic (gathered at run-time, e.g. timing events, performance summaries, etc.) and is used as a basis for describing performance properties. A performance property (e.g. load imbalance, communication, cache misses, etc.) characterizes a specific type of performance behavior which may be present in a program. Checks for which properties are present in (the execution of) a program are given by a set of conditions defined over the performance-related data. Conditions have an associated confidence level which indicates the degree of certainty in the diagnosis of the presence of the performance property. Performance properties also have an associated severity measure (usually an expression), the magnitude of which specifies the importance of the property in terms of its contribution to limiting the performance of the program. The severity can be used to focus effort on the important performance issues during the (manual or automatic) performance tuning process.
Since 1994, I am working in the field of symbolic analysis to support parallelizing compilers and performance tools. Many analyses and optimizations of distributed and parallel programs benefit from techniques that are able to analyse program unknowns, compute the values of program variables or symbolic expressions, and determine the condition under which control flow reaches a program statement. I have started a symbolic analysis project that focuses on the following three research goals:
- Basic Symbolic analysis:
- Symbolic Evaluation Framework:
- Applied Symbolic Analysis:
- To be filled in ...
I am working in the field of scalability analysis of applications/algorithms/codes for distributed and parallel systems. A major difficulty in restructuring and optimizing distributed and parallel programs is how to compare performance over a range of system and problem sizes. Execution time varies with system and problem size and an initially fast implementation may become slow when system and problem size scale up. This paper introduces the concept of range comparison. Unlike conventional execution time comparison in which performance is compared for a particular system and problem size, range comparison compares the performance of programs over a range of ensemble and problem sizes via scalability and performance crossing point analysis. A novel algorithm is developed to predict the crossing point automatically. The correctness of the algorithm is proved and a methodology is developed to integrate range comparison into restructuring compilations for data-parallel programming. A preliminary prototype of the methodology is implemented and tested under the VFC compiler. Experimental results demonstrate that range comparison is feasible and effective. It is an important asset for program evaluation, restructuring compilation, and distributed and parallel programming.
I am a the leader of a workpackage about Identification and Formalization of Knowledge for automatic performance analysis. This research is conducted as part of the Esprit Working Group APART (Automatic Performance Analysis: Resources and Tools) which is funded by the European Commission. In my working group we explore all issues in automatic performance analysis support for current and future distributed and parallel machines and programming models.Im actively participating in the Esprit Working Group EuroTools which is funded European Commission. It was created in April 1998. Its main objective is to increase the use of european HPCN software, both inside and outside Europe. The EuroTools Working group aims at the establishment of collaborations among designers, developers, vendors and users of european software tools. These collaborations will help people interested in HPCN technology to communicate, exchange ideas and identify emerging trends in HPCN software. Focused technical meetings are organized to help developers to exchange ideas on a specific technical subject and give a better evaluation of user needs and requirements.
I am also developing novel techniques for optimizing communication which resulted in a general framework for cost-driven, machine and problem sensitive communication placement for distributed and parallel programs.Furthermore, I started an effort to evaluate lightweight threads for improving the performance of data and task parallel programs.
I am the leader of the Aurora Tools project where we develop and extend a variety of tools including a symbolic debugger, a performance measurement and analysis system , a Fortran90 to HPF+ translator, a performance estimator, and a graphical user interface (combines all Aurora tools).
See also the list of current research projects in which I participate.