Next: Shape Modeling Laboratory Up: Department of Computer Previous: Multimedia Systems Laboratory

Human Interface Laboratory

/ Masahide Sugiyama / Professor
/ Susantha Herath / Associate Professor
/ Michael Cohen / Assistant Professor
/ Minoru Ueda / Assistant Professor

Using our communication channels (sense organs: ears, mouth, eyes, nose, skin, etc) we can communicate each other, including between human and human, human and machine, and human and every information sources.

Research area of Human Interface Laboratory covers enhancement and generation of various human interface channels.

We have the following two main research topics:

In order to achieve our topics, we continued to establish the experimental environment: We introduced 3 HP workstations (2 HP9000/712, 715) and the server/client type speech recognition system implemented before installed on them to do parallel processing of speech recognition. Also a speech synthesizer was connected to the workstation (Sun/S10) to develop a man machine dialogue system, also this is very useful to communication between blind people and computers. To Develop of visual programming language for 4GL a workstation (Sun/Spark) has been introduced. For Audio Window (virtual acoustics) research a NeXT workstation and convolution engine have been introduced.

In order to encourage the community of Sign Language Research, we promoted a workshop on Aug. 9th in the University of Aizu sponsored by Sign Language Technology Committee of IEICE. One of members promoted a meeting of Speech Research Committee of IEICE on Oct 13th at the University. We organized IWHIT94 on Sep. 29th and 30th (International Workshop on Human Interface technology 1994) which was sponsored by the International Affairs Committee of the University of Aizu. The workshop had 5 sessions (1. Speech Recognition and Translation, 2. Robustness in Speech Recognition, 3. Speaker Recognition and Segregation, 4. Virtual Acoustics, 5. Non-Verbal Communication): 8 keynote lectures and 11 lectures.

We proposed a joint project on ``Multi-modal Human Interface for Handicapped People" to Multimedia Information Center in the University of Aizu, and the project has been successfully approved and received budget of about 30 million yen to establish this project.

We promoted 6 SCCPs for students (Social Hyper Networking, Visual Language for Office Processing Software, Speech Dialogue System, Computer Music, Neural Network Modeling, Non-Verbal Communication) and 3 Joint Projects (Study of Machine Processing of Signs Generated by Hand Movements; Study on Speech Recognition under Noisy Environment; Audio Windows: Spatialization of Synthesized Speech, Spatialization of Music and Hierarchical Organization of Spatial Sound Sources), and 1 Courseware project (Speech Processing and Speech Recognition). Our members received a commissioned research fund from ATR Interpreting Telecommunication Research Lab on ``Study on Speech Recognition System Based on Information Theory", NTT Human Interface Lab on ``Audio Windows", and a research grant from Ministry of Education on ``Robust Speech Recognition Using Microphone array and Signal Source Modeling Technique".

We exhibited our research activities (FPM-LR speech recognition system, Audio window and Sign language) in the open campus in University Festival. We promoted Lab Open House for Freshmen and 3 Labs exhibited their research activities.

On our research activity we presented 6 papers in refereed International Conferences and 4 full papers in refereed academic journals. We promoted series of HI Lab seminar which contained 8 lectures. We published textbooks on Human Interface Technology as Human Interface Technology Series: Vol. 1: ``Speech Processing and Speech Recognition" and Vol. 2: ``Speech Recognition --- Advanced Technology--", and using special educational fund we printed 50 copies each.

One of members organized working group on ``Blind and Computer" and about 30 people attended to the first working group (March 12th, 1995). The topics are ``Personal computer environment for blind people", ``Multimedia Information Center in the University of Aizu", ``Recent NTT speech research and products".

We started to provide Human Interface Lab WWW information to open our research and education activities to the world. http://www/labs/sw-hi/HI.html.

Refereed Journal Papers

  1. Shigeaki Aoki, Michael Cohen, and Nobuo Koizumi. Design and control of shared conferencing environments for audio telecommunication. Presence: Teleoperators and Virtual Environments, 3(1):60--72, Winter 1994. A technique is presented for dynamically invoking a set of head-related transfer functions (HRTFs) and scaling gain, driven by a map in a graphical window. With such an interface, users may configure a virtual conferencing environment, manipulating virtual positions of teleconferees. The design of a personal headphone teleconferencing prototype is proposed, integrating spatialized sound presentation with individualized HRTF measurement using a bifunctional transducer. According to subjective tests, the use of individualized HRTFs instead of dummy head HRTFs can reduce front<->back sound image confusion.

  2. Michael Cohen. Cybertokyo: A survey of public vrtractions. Presence: Teleoperators and Virtual Environments, 3(1):87--93, Winter 1994.

    As in the Bauhaus movement of the '30s, artists and engineers are working together on commercial industrial (hardware) and post-industrial (software) design. Japan, a world leader in R/D areas like display technology and robotics, is a fertile environment in which VR (known here sometimes as AR [for artificial reality]) can flourish, both in labs and studios, and as consumer products and services: a confluence of theme parks, amusement centers, retail outlets, and home computer and media centers. Emphasizing the capture, transmission, and reproduction of experience, literally sensational VR is upon us, to simulate and stimulate. If it's hyped, or hyper, it's happening around Tokyo. Here's a selective guide to meta-holo-attractions open to the public in `The Big Orange.'

  3. Michael Cohen. Adaptive character generation and spatial expressiveness. TUGboat: Communications of the TeX Users Group, 15(3):192--198, September 1994.

    Zebrackets is a system of meta-METAFONTs to generate semi-custom striated parenthetical delimiters on demand. Contextualized by a pseudo-environment in LaTeX, and invoked by an aliased pre-compiler, Zebrackets are nearly seamlessly invokable in a variety of modes, manually or automatically generated marked matching pairs of background, foreground, or hybrid delimiters, according to a unique index or depth in the expression stack, in `demux,' unary, or binary encodings of nested associativity. Implemented as an active filter that re-presents textual information graphically, adaptive character generation can reflect an arbitrarily wide context, increasing the information density of textual presentation by reconsidering text as pictures and expanding the range of written spatial expression.

  4. J. Murakami, S. Sugiyama, and H. Watanabe. Study of unknown-multiple signal source clustering problem using ergodic hmm. Trans. of IEICE, J78-D-II(2):197--204, 1995.

    In this paper, we consider signals that have originated from a sequence of sources. The problem of segmenting the signal and identifying segments to their sources is addressed. This problem has wide applications in many fields. This report describes a resolution method using Ergodic Hidden Markov Models (HMM). In this model, each HMM state corresponds to a signal source. The signal source sequence can therefore be determined by using Viterbi decoding over the observation sequence. Baum-Welch training can be used to estimate the HMM parameters from training material. As an application of the multiple signal source indentification problem, an experiment was performed on unknown speaker identification. As a result, a classification rate of 79% for 4 male speakers was obtained. The results further indicated that the model is sensitive to the initial values of the Ergodic HMM and that the long distance LPC cepstrum is an effective way of preprocessing the signal.

Refereed Proceeding Papers

  1. A. Herath, Y. Hyodo, Y. Kawada, T. Ikeda, and S. Herath. A practical machine translation system from japanese to modern sinhalese. In The 1994 Joint Conference of the 8th Asian Conference on Languages Information, and Computation and the 2nd Pacific Asia Conference on Formal and Computational Linguistics, pages 1--5, August 1994.

    During the last few decades, the requirments of the international market imposed by economic forces have led to the necessity to develop effective and efficent electronic natural language processing tools. Many Machine Translation (MT) systems are being developed world wide, especially in Japan and Europe to address this chalanges in the 21 century. The research and development of modern Sinhalese began recently. This paper discuss the similaraties of Japanese and Sinhalese and discusses the metholody used on MT process, and the problems encountered and present status and future plans.

  2. Michael Cohen and Nobuo Koizumi. Putting spatial sound into voicemail. In NR94: Proc. 1st International Workshop on Networked Reality in TeleCommunication, Tokyo, May 1994. IEEE COMSOC, IEICE.

    MAW's audio window reinterpretation of standard idioms for WIMP systems-- including draggably rotating icons, and directionalized and non-atomic spatial sound objects-- compliments features that are especially well suited for asynchronous operations, including compatibility with hypermail (allowing spatial sound to be put into electronic mail). By embedding MAW documents, which might include dynamic effects, alongside voicemail, we tag each utterance as a spatial channel.

  3. Michael Cohen. Design for a gps pgs. In Harry J. Murphy, editor, Proc. Virtual Reality and Persons with Disabilities, San Francisco, CA, June 1994. Center for Disabilities.

    Augmented reality describes hybrid presentations that overlay computer-generated imagery on top of real scenes. Augmented audio reality extends this notion to include sonic effects, superimposing artificially synthesized sounds on top of the naturally sampled soundscape (through an acoustically transparent speaker system). Spatial sound is the presentation of audio channels with positional attributes. DSP-synthesized spatial sound, driven by even a simple positional database, can denote directional cues useful to a blind user. Maw (acronymic for multidimensional audio windows) is a NextStep-based audio windowing system deployed as a binaural directional mixing console, capable of presenting such augmented audio reality spatial sound cues. A design for a computer navigator, coupled with a GPS (global positioning system), is described, using dynamically selected HRTFs ( head related transfer f unctions) to directionalize arbitrary audio signals. The system, targeted for blind users who enjoy outdoor walks, is intended to provide capability for an audio compass, homing beacon, or general sonic cursor or telepointer for team communication.

  4. Michael Cohen. Using audio windows to analyze music. In Greg Kramer, editor, Proc. Int. Conf. on Auditory D isplay, Santa Fe, NM, November 1994.

    Poster, ISBN-0-201-62603-9

  5. M. Sugiyama. Speech segmentation and clustering based on acoustic gender features. In Proc. of ICSPAT94, pages 1687--1691, October 1994.

    This paper describes speech segmentation and clustering algorithms based on acoustic gender features, where speakers and speech context are unknown. As the simpler case, when speech segmentations are known, the Output Probability Vector Clustering algorithm is applied. In the case of unknown segmentations, an ergodic HMM-based technique is applicable. In this paper only the simpler case is focused on and evaluated using simulated multi-speaker dialogue speech data.

  6. S. Takahashi, T. Ikeda, Y. Shinagawa, T. L. Kunii, and M. Ueda. Extracting features from discrete elevation data with topological integrity: Algorithms for extracting critical points and constructing topological graphs. In Proceedings of EuroGraphics '95, Amsterdam,Netherland, 1995. EUROGRAPHICS Association, North-Holland Publisher Co.


  1. H. Yodogawa, M. Sugiyama, and et al. Neural Networks, volume 2 of ATR Advanced Tenology Series. Ohm Publishing Co., 1994.

Unrefereed Papers

  1. Michael Cohen. Using audio windows to analyze music. In IWHIT94: Proc. Int. Wkshp. on Human Interface Technology, pages 78--84, Aizu-Wakamatsu, Japan, September 1994.

  2. M. Sugiyama. Study on speech database retrieval using speech key. In ASJ, editor, Proc. of ASJ Fall Meeting, pages 59--60, October 1994.

  3. M. Sugiyama. Fast speech data retrieval using speech key. In ASJ, editor, Proc. of ASJ Spring Meeting, pages 199--200, March 1995.

  4. M. Sugiyama. Categorization based on signal source models. In Proc. of IWHIT94, pages 51--56, September 1994.

Technical Reports

  1. Michael Cohen and Elizabeth M. Wenzel. The design of multidimensional sound interfaces. 95-1-004, The University of Aizu, 1995.

  2. Michael Cohen. Multimedia is for everyone: Conferences, concerts, and cocktail parties. 95-1-005, The University of Aizu, 1995.


  1. Susantha Herath. Multi modal human interface. General Research MMHI-2, Human Interface, April 1994.

  2. Michael Cohen. Audio window. Donation NTT-1, Research Grant, April 1994.

  3. Masahide Sugiyama. Robust speech recognition using microphone array and signal source modeling technique. High Priority Research Field 06232102, Speech Dialogue, April 1994.

Academic Activities

  1. Susantha Herath, IEEE, April 1994. IEEE Coordinator.

  2. Susantha Herath, International workshop on Human Interface Technology 94 (IWHIT94) (1994.9.29-30), May 1994. Working committee member of IWHIT.

  3. Susantha Herath, Applied Intelligence: The International Journal of Artificial Intelligence, Neural Networks, and Complex Problem Solving Technologies, published by Kluwer Academic Publishers, April 1994. Reviewer.

  4. Susantha Herath, Organized of Sign Language Engineering (SiLE) meeting 94. 8. 9 in Aizu, August 1994. Organizer.

  5. Susantha Herath, SiLE, IEICE, April 1994. Member.

  6. Masahide Sugiyama, SiLE in IEICE, April 1994. Member of committee of Sign Linguistic Technology Research.

  7. Masahide Sugiyama, ASJ and IEICE, May 1994. Referee.

  8. Masahide Sugiyama, ASJ and IEICE, 10 and 3 1994. Chairman of Sessions.

  9. Masahide Sugiyama, IEICE and ASJ, May 1994. Planning Secretary.

  10. Masahide Sugiyama, ASJ, May 1994. Member of Tohoku Regional Board.


  1. Masahide Sugiyama, April 1994. ATR Institute Commissioned Research Fund.

  2. Masahide Sugiyama, April 1994. NTT Institute Commissioned Research Fund.

Next: Shape Modeling Laboratory Up: Department of Computer Previous: Multimedia Systems Laboratory
January 1996