Next: Distributed Parallel Processing Up: Department of Computer Previous: Mathematical Foundation of

# Language Processing Systems Laboratory

/ Harvey Abramson / Professor
/ David S.L. Wei / Associate Professor
/ Kyung-Goo Doh / Assistant Professor

This year was the second year for the Language Processing Systems Laboratory. As such, the achievement consisted in continuing existing lines of research by each of the lab members (as exemplified in the publication and achievement list) and also continuing two group projects, namely Towards a Programming Language Designer's Workbench based on Logic Programming and Action Semantics, and Granularity Optimization and Scheduling on Massively Parallel Computer Systems, which are stated in what follows.

Towards a Programming Language Designer's Workbench based on Logic Programming and Action Semantics

Language development usually proceeds in the following stages: design, specification, prototyping, and implementation (with no strict ordering). We call it a programming language life cycle. Many programming languages have been developed based on the life cycle. However, their specifications are usually written in English: such specifications can be very misleading for both implementors and users since the designers' original intention may be misunderstood. Even when a programming language is formally specified (normally based on denotational, operational, or axiomatic semantics frameworks), it is still difficult to pass on the designer's intention to others because it rarely gives hints on how to implement the language. With this in mind, we are developing a Programming Language Designer's Workbench based on a recently developed framework, action semantics. The workbench allows a language designer to design and implement a language in one setting. Thus, the language designers can actually confirm that the designed language is in fact the one he intended. This consistency check is indeed important because it eliminates potential design errors and possible misunderstanding among designers and implementors.

The project has been carried out in three differrent directions: (1) developing theories necessary to realize programming language designer's workbench; (2) designing methodology related to actually building up the general-purpose compiler generator based on action semantics framework; (3) studying semantics of many constructs of programming languages (imperative, functional, logic, object-oriented, etc.) to see how action semantics specifications affect the understanding of semantics of programming languages as well as the design of new languages.

Granularity Optimization and Scheduling on Massively Parallel Computer Systems

Despite recent advances in massively parallel processing (MPP), resource management remains a critical issue for achieving high performance on MPP architectures. Resource management involves code and data partitioning, scheduling and control of program execution. How to manage these activities to achieve the fastest parallel execution for a single application is an important and challenging problem that so far has not been satisfactorily resolved. CASS, which stands for Clustering And Scheduling System, is a compiler optimization tool that provides facilities for automatic partitioning and scheduling of parallel programs on distributed memory architectures. Central to the design of CASS is grain-size optimization. The high communication overhead in current MPP architectures imposes a minimum grain-size (i.e., computation-to-communication ratio) below which performance degrades significantly. Using task clustering techniques, CASS restructures a parallel program to one whose grain-size matches that of the target architecture, while minimizing overall parallel execution time.

The completed work of this research in the past fiscal year includes:

• Developed two efficient clustering and scheduling algorithms, namely CASS1 (without task duplication) and CASS2 (with task duplication). Experimental results have been obtained, showing that both of our algorithms are superior to existing approaches.
• Extended CASS to dynamic task graphs.

Refereed Journal Papers

1. S. Rajasekaran and David S. L. Wei. Selection, routing and sorting on the star graph. Journal of Parallel and Distributed Computing, Accepted to appear, 1995.

Abstract: We consider the problems of selection, routing and sorting on an $n$-star graph (with $n!$ nodes), an interconnection network which has been proven to possess many special properties. We identify a tree like subgraph (which we call as a $(k,1,k)$ chain network') of the star graph which enables us to design efficient algorithms for the above mentioned problems. We present an algorithm that performs a sequence of $n$ prefix computations in $O(n^2)$ time. This algorithm is used as a subroutine in our other algorithms. We also show that sorting can be performed on the $n$-star graph in time $O(n^{3})$ and that selection of a set of uniformly distributed $n$ keys can be performed in $O(n^2)$ time with high probability. Finally, we also present a deterministic (non oblivious) routing algorithm that realizes any permutation in $O(n^{3})$ steps on the $n$-star graph. There exists an algorithm in the literature that can perform a single prefix computation in $O(n\lg n)$ time. The best known previous algorithm for sorting has a run time of $O(n^3\lg n)$ and is deterministic. To our knowledge, the problem of selection has not been considered before on the star graph.

2. M. A. Palis, J. C. Liou, and David S. L. Wei. Task clustering and scheduling for distributed memory parallel architectures. IEEE Trans. on Parallel and Distributed Systems, Accepted to appear, 1995.

Abstract: This paper addresses the problem of scheduling parallel programs represented as directed acyclic task graphs for execution on distributed memory parallel architectures. Because of the high communication overhead in existing parallel machines, a crucial step in scheduling is task clustering, the process of coalescing fine grain tasks into single coarser ones so that the overall execution time is minimized. The task clusting problem is NP-hard, even when the number of processors is unbounded and task duplication is allowed. A simple greedy algorithm is presented for this problem which, for a task graph with arbitrary granularity, produces a schedule whose makespan is at most twice optimal. Indeed, the quality of the schedule improves as the granularity of the task graph becomes larger. For example, if the granularity is at least 1/2, the makespan of the schedule is at most 5/3 times optimal. For a task graph with $n$ tasks and $e$ inter-task communication constraints, the algorithm runs in $O(n(n \lg n + e))$ time, which is $n$ times faster than the currently best known algorithm for this problem. Similar algorithms are develpoed that produce: (1) optimal schedlues for coarse grain graphs; (2) 2-optimal schedules for trees with no task duplication; and (3) optimal schedules for coarse grain trees with no task duplication.

Refereed Proceeding Papers

1. David S. L. Wei. Quicksort and permutation routing on the hypercube and deBruijn networks. In International Symposium on Parallel Architectures, Algorithms, and Networks, pages 238--245. IEEE Computer Society, IEEE Computer Society Press, December 1994.

Abstract: We consider the problems of sorting and routing on some unbounded interconnection networks, namely hypercube and de Bruijn network. We first present two efficient implementations of quicksort on the hypercube. The first algorithm sorts $N$ items on an $N$-node hypercube, one item per node, in $O(\frac{\log^{2}N}{\log\log N})$ time with high probability, while the other one sorts N items on an $(N/\log N)$-node hypercube, $\log N$ items per node, in $O(\log^{2}N)$ time with high probability, which achieves optimal speedup in the sense of $PT$ product. Both algorithms beat the previously best known parallel quicksort that runs in $O(\log^{2} N)$ expected time on a butterfly of $N$ nodes. We also present a deterministic (nonoblivious) permutation routing algorithm which runs in $O(d \cdot n^{2})$ time on a $d$-ary de Bruijn network of $N=d^{n}$ nodes. To the best of our knowledge, this routing algorithm is so far the fastest deterministic one for the de Bruijn network of arbitrary degree. The previously best known one runs in $O((\log d) \cdot d \cdot n^{2})$ time. All algorithms presented are simple, the constants hidden behind the big Oh being small.

2. M. Palis, J. C. Liou, and David S. L. Wei. A greedy task clustering heuristic that is provably good. In International Symposium on Parallel Architectures, Algorithms, and Networks, pages 398--405. IEEE Computer Society, IEEE Computer Society Press, December 1994.

Abstract: A simple greedy algorithm is presented for task clustering with duplication (or recomputation) which, for a task graph with arbitrary granularity, produces a schedule whose makespan is at most twice optimal. For a task graph with $n$ tasks and $e$ inter-task communication constraints, the algorithm runs in $O(n(n \lg n + e))$ time, which is $n$ times faster than the currently best known algorithm for this problem. Similar algorithms are develpoed that produce: (1) optimal schedlues for coarse grain graphs; (2) 2-optimal schedules for trees with no task duplication; and (3) optimal schedules for coarse grain trees with no task duplication.

3. D. F. Hsu and David Wei. Permutation routing and sorting on directed de bruijn network. In International Conference on Parallel Processing, Wisconsin, U.S.A, August 1995.

Abstract: We show that any deterministic oblivious routing scheme for the permutation routing on a $d$-ary de Bruijn network of $N=d^{n}$ nodes, in the worst case, will take $\Omega(\sqrt{N})$ steps under the {\it single-port} model. This lower bound improves the results shown in \cite{BH82} and \cite{KKT90} provided $d$ is not a constant. A matching upper bound is also given. We also present an efficient general sorting algorithm which runs in $O((\log d) \cdot d \cdot n^{2})$ time for directed de Bruijn network with $d^{n}$ nodes, degree $d$ and diameter $n$.

4. Harvey Abramson. Why Johnny Can't Read the Screwiest Writing System in the World and How to Help Him Learn: On the Necessity of Japanese--English Hyperdictionaries. In Proceedings - PACLING '93, April 1994.

5. Harvey Abramson. Logic Programming and Grammars. In The Encyclopedia of Language and Linguistics, 1994.

6. Harvey Abramson. Prolog Based Aids for the Study of Japanese. In Proceedings - Practical Applications of Prolog Conference, pages 1--13, April 1994.

7. Harvey Abramson. Breaking Out of the Web of Kanji or What is a Dictionary? . In Proceedings - Language Engineering on the Information Highway Conference, September 1994.

8. Harvey Abramson and Veronica Dahl. Extending Logic Grammars with ID-LP Specifications. In Proceedings - Iberamia 94, IV Congreso Iberoamerican de Inteligencia Artificial, pages 230--246, October 1994.

9. K.-G. Doh and D. A. Schmidt. 1st international workshop on action semantics. In Peter D. Mosses, editor, The facets of action semantics: some principles and applications, pages 1--15. BRICS, April 1994.

Abstract: A distinguishing characteristic of action semantics is its facet system, which defines the variety of information flows in a language definition. The facet system can be analyzed to validate the well-formedness of a language definition, to infer the typings of its inputs and outputs, and to calculate the operational semantics of programs. We present a single framework for doing all of the above. The framework exploits the internal subsorting structure of the facets so that sort checking, static analysis, and operational semantics are related, sound instances of the same underlying analysis. The framework also suggests that action semantics's extensibility can be understood as a kind of weakening rule'' in a logic'' of actions. In this paper, the framework is used to perform type inference on specific programs, to justify meaning-preserving code transformations, and to `stage'' an action semantics definition of a programming language into a static semantics stage and a dynamic semantics stage.

10. K.-G. Doh. Acm-sigplan symposium on partial evaluation and semantics-based program manipulation. In Action transformation by partial evaluation, pages 230--240. SIGPLAN, ACM, June 1995.

Abstract: The compiler automatically generated from action-semantics directed compiler generators first translates a source program to its action denotation, and then to a target code. To improve the efficiency of the target code, it is essential to transform the action denotation into the proper one leading to a better target code. In this paper, an automatic action transformation based on partial evaluation is presented, along with methodology to analyze action semantics equations, allowing necessary analyses to be performed at compiler-generation time before source programs are given.

1. Jing-Chiou Liou, 1994.

Grain-Size Optimization and Task Scheduling on Distributed Memory Parallel Architectures, New Jersey Institute of Technology. Thesis advisor: David S. L. Wei (with Dr. Micheal Palis).

1. Harvey Abramson, Natural Language Understanding and Logic Programming, April 1994.

Member Program Committee 'Natural Language Understanding and Logic Programming 5, Lisbon.

Others

1. David S. L. Wei, 1994. Invited speaker: National Tsing Hua University, Taiwan.

2. David S. L. Wei, 1995. Invited speaker: The New York Academy of Sciences.

3. David S. L. Wei, 1995. Invited speaker: Laboratory for Computer Science, Massachusetts Institute of Technology.

4. David S. L. Wei, 1994. Advisor of Student Karate Club, University of Aizu.

5. David S. L. Wei, 1994. Referee of IEEE Trans. on Computers, and Journal of Parallel and Distributed Computing.

6. David S. L. Wei. Invited speaker: National Taiwan University.

7. Kyung-Goo Doh, 1994. Referee for Journal of Functional Programming.

8. Kyung-Goo Doh, 1994.

Refereed for ACM-SIGPLAN Symposium on Partial Evaluation and Semantics-based Program Manipulation.

9. Kyung-Goo Doh, 1994. Talk at the Department of Computer Science, Aarhus University, Denmark.

Next: Distributed Parallel Processing Up: Department of Computer Previous: Mathematical Foundation of

www@u-aizu.ac.jp
January 1996