/ Harvey Abramson / Professor
/ David S.L. Wei / Associate Professor
/ Kyung-Goo Doh / Assistant Professor
This year was the second year for the Language Processing Systems Laboratory. As such, the achievement consisted in continuing existing lines of research by each of the lab members (as exemplified in the publication and achievement list) and also continuing two group projects, namely Towards a Programming Language Designer's Workbench based on Logic Programming and Action Semantics, and Granularity Optimization and Scheduling on Massively Parallel Computer Systems, which are stated in what follows.
Towards a Programming Language Designer's Workbench based on Logic Programming and Action Semantics
Language development usually proceeds in the following stages: design, specification, prototyping, and implementation (with no strict ordering). We call it a programming language life cycle. Many programming languages have been developed based on the life cycle. However, their specifications are usually written in English: such specifications can be very misleading for both implementors and users since the designers' original intention may be misunderstood. Even when a programming language is formally specified (normally based on denotational, operational, or axiomatic semantics frameworks), it is still difficult to pass on the designer's intention to others because it rarely gives hints on how to implement the language. With this in mind, we are developing a Programming Language Designer's Workbench based on a recently developed framework, action semantics. The workbench allows a language designer to design and implement a language in one setting. Thus, the language designers can actually confirm that the designed language is in fact the one he intended. This consistency check is indeed important because it eliminates potential design errors and possible misunderstanding among designers and implementors.
The project has been carried out in three differrent directions: (1) developing theories necessary to realize programming language designer's workbench; (2) designing methodology related to actually building up the general-purpose compiler generator based on action semantics framework; (3) studying semantics of many constructs of programming languages (imperative, functional, logic, object-oriented, etc.) to see how action semantics specifications affect the understanding of semantics of programming languages as well as the design of new languages.
Granularity Optimization and Scheduling on Massively Parallel Computer Systems
Despite recent advances in massively parallel processing (MPP), resource management remains a critical issue for achieving high performance on MPP architectures. Resource management involves code and data partitioning, scheduling and control of program execution. How to manage these activities to achieve the fastest parallel execution for a single application is an important and challenging problem that so far has not been satisfactorily resolved. CASS, which stands for Clustering And Scheduling System, is a compiler optimization tool that provides facilities for automatic partitioning and scheduling of parallel programs on distributed memory architectures. Central to the design of CASS is grain-size optimization. The high communication overhead in current MPP architectures imposes a minimum grain-size (i.e., computation-to-communication ratio) below which performance degrades significantly. Using task clustering techniques, CASS restructures a parallel program to one whose grain-size matches that of the target architecture, while minimizing overall parallel execution time.
The completed work of this research in the past fiscal year includes:
Refereed Journal Papers
Abstract: We consider the problems of selection, routing and sorting on an $n$-star graph (with $n!$ nodes), an interconnection network which has been proven to possess many special properties. We identify a tree like subgraph (which we call as a `$(k,1,k)$ chain network') of the star graph which enables us to design efficient algorithms for the above mentioned problems. We present an algorithm that performs a sequence of $n$ prefix computations in $O(n^2)$ time. This algorithm is used as a subroutine in our other algorithms. We also show that sorting can be performed on the $n$-star graph in time $O(n^{3})$ and that selection of a set of uniformly distributed $n$ keys can be performed in $O(n^2)$ time with high probability. Finally, we also present a deterministic (non oblivious) routing algorithm that realizes any permutation in $O(n^{3})$ steps on the $n$-star graph. There exists an algorithm in the literature that can perform a single prefix computation in $O(n\lg n)$ time. The best known previous algorithm for sorting has a run time of $O(n^3\lg n)$ and is deterministic. To our knowledge, the problem of selection has not been considered before on the star graph.
Abstract: This paper addresses the problem of scheduling parallel programs represented as directed acyclic task graphs for execution on distributed memory parallel architectures. Because of the high communication overhead in existing parallel machines, a crucial step in scheduling is task clustering, the process of coalescing fine grain tasks into single coarser ones so that the overall execution time is minimized. The task clusting problem is NP-hard, even when the number of processors is unbounded and task duplication is allowed. A simple greedy algorithm is presented for this problem which, for a task graph with arbitrary granularity, produces a schedule whose makespan is at most twice optimal. Indeed, the quality of the schedule improves as the granularity of the task graph becomes larger. For example, if the granularity is at least 1/2, the makespan of the schedule is at most 5/3 times optimal. For a task graph with $n$ tasks and $e$ inter-task communication constraints, the algorithm runs in $O(n(n \lg n + e))$ time, which is $n$ times faster than the currently best known algorithm for this problem. Similar algorithms are develpoed that produce: (1) optimal schedlues for coarse grain graphs; (2) 2-optimal schedules for trees with no task duplication; and (3) optimal schedules for coarse grain trees with no task duplication.
Abstract: We consider the problems of sorting and routing on some unbounded interconnection networks, namely hypercube and de Bruijn network. We first present two efficient implementations of quicksort on the hypercube. The first algorithm sorts $N$ items on an $N$-node hypercube, one item per node, in $O(\frac{\log^{2}N}{\log\log N})$ time with high probability, while the other one sorts N items on an $(N/\log N)$-node hypercube, $\log N$ items per node, in $O(\log^{2}N)$ time with high probability, which achieves optimal speedup in the sense of $PT$ product. Both algorithms beat the previously best known parallel quicksort that runs in $O(\log^{2} N)$ expected time on a butterfly of $N$ nodes. We also present a deterministic (nonoblivious) permutation routing algorithm which runs in $O(d \cdot n^{2})$ time on a $d$-ary de Bruijn network of $N=d^{n}$ nodes. To the best of our knowledge, this routing algorithm is so far the fastest deterministic one for the de Bruijn network of arbitrary degree. The previously best known one runs in $O((\log d) \cdot d \cdot n^{2})$ time. All algorithms presented are simple, the constants hidden behind the big Oh being small.
Abstract: A simple greedy algorithm is presented for task clustering with duplication (or recomputation) which, for a task graph with arbitrary granularity, produces a schedule whose makespan is at most twice optimal. For a task graph with $n$ tasks and $e$ inter-task communication constraints, the algorithm runs in $O(n(n \lg n + e))$ time, which is $n$ times faster than the currently best known algorithm for this problem. Similar algorithms are develpoed that produce: (1) optimal schedlues for coarse grain graphs; (2) 2-optimal schedules for trees with no task duplication; and (3) optimal schedules for coarse grain trees with no task duplication.
Abstract: We show that any deterministic oblivious routing scheme for the permutation routing on a $d$-ary de Bruijn network of $N=d^{n}$ nodes, in the worst case, will take $\Omega(\sqrt{N})$ steps under the {\it single-port} model. This lower bound improves the results shown in \cite{BH82} and \cite{KKT90} provided $d$ is not a constant. A matching upper bound is also given. We also present an efficient general sorting algorithm which runs in $O((\log d) \cdot d \cdot n^{2})$ time for directed de Bruijn network with $d^{n}$ nodes, degree $d$ and diameter $n$.
Abstract: A distinguishing characteristic of action semantics is its facet system, which defines the variety of information flows in a language definition. The facet system can be analyzed to validate the well-formedness of a language definition, to infer the typings of its inputs and outputs, and to calculate the operational semantics of programs. We present a single framework for doing all of the above. The framework exploits the internal subsorting structure of the facets so that sort checking, static analysis, and operational semantics are related, sound instances of the same underlying analysis. The framework also suggests that action semantics's extensibility can be understood as a kind of ``weakening rule'' in a ``logic'' of actions. In this paper, the framework is used to perform type inference on specific programs, to justify meaning-preserving code transformations, and to ``stage'' an action semantics definition of a programming language into a static semantics stage and a dynamic semantics stage.
Abstract: The compiler automatically generated from action-semantics directed compiler generators first translates a source program to its action denotation, and then to a target code. To improve the efficiency of the target code, it is essential to transform the action denotation into the proper one leading to a better target code. In this paper, an automatic action transformation based on partial evaluation is presented, along with methodology to analyze action semantics equations, allowing necessary analyses to be performed at compiler-generation time before source programs are given.
Grain-Size Optimization and Task Scheduling on Distributed Memory Parallel Architectures, New Jersey Institute of Technology. Thesis advisor: David S. L. Wei (with Dr. Micheal Palis).
Member Program Committee 'Natural Language Understanding and Logic Programming 5, Lisbon.
Refereed for ACM-SIGPLAN Symposium on Partial Evaluation and Semantics-based Program Manipulation.