Japanese
Department of Computer Software

Human Interface Laboratory

Masahide SugiyamaMichael CohenJie HuangMinoru Ueda
Masahide Sugiyama
Professor
Michael Cohen
Associate Professor
Jie Huang
Associate Professor
Minoru Ueda
Assistant Professor
Human Interface Lab. is consisted of four faculty members. Each member has own interest and are doing independent research activities;

Prof. Masahide Sugiyama:
  Participated in Open Campus and the title is "Multimedia Processing", and demonstrated "VC (Video Caption) Player" in Big Pallet, Kohriyama (Nov.14, 2003). Promoted a series of computer training courses for visual handicapped people in Aizu region.

Prof. Michel Cohen:
  Participated in Open Campus with lab demonstrations. Presented public lecture with students (Saturday, Jan. 23, 2004: 10:30 - 12:00): "Spatial Media at the University of Aizu: 3D Computer Graphics and Audio"(http://www.u-aizu.ac.jp/ mcohen/spatial-media/koukaikouzas/2003-4/)

Prof. Jie Huang:
1.Computational Auditory Scene Analysis

There was intensive research on the integration and segregation factors for sound component self organization. We conducted some examinations on the interactions between different primary cues for sound integration and segregation in human auditory systems by psychological experiments and tried to find out quantitative relations for computational auditory scene analysis.
2.Robotic Spatial Sound Localization We developed a robotic spatial sound localization system using an auditory interface with four microphones arranged on the surface of a spherical robot head. The time difference and intensity difference from a sound source to different microphones are analyzed by measuring the HRTFs around the spherical head in an anechoic chamber.

3.Robot Position Identification

We proposed a visual position identification method using colored rectangle signboards for a multi modal mobile robot which is designed to work in a room environment. The rectangle signboards are placed in several known positions. By calculate the vanishing points in the image of the signboard, it is possible to obtain the relative direction between the signboard and the robot by a single image, the distance from the robot to the signboard can also be calculated.

Refereed Journal Papers

[j-huang-01:2003]K. Yoshida and J. Huang. Competition between different primary cues for sound component organization in human audition. J. three dimensional images, 16(4):42-47, 2002.
[mcohen-01:2003]Alamshah Bolhassan, Michael Cohen , and William L.Martens. VR4U2C: A Multiuser Multi perspective Panorama and Turnorama Browser Using QuickTimeVR and Java Featuring Multi monitor and Stereo graphic Display. TVRSJ: Trans. Virtual Reality Society of Japan, 9(1):69-78, 2004.

We have developed a multiuser multi perspective panoramic and object movie (turnorama) browser, using Apple's QuickTimeVR technology and the Java programming language with the support of the "QuickTime for Java" application programming interfaces (api). This novelqtvr browser allows coordinated display of multiple views of a virtual environment, limited practically only by the size and number of monitors or projectors available around users in various viewing locations. Named "VR4U2C" ("virtual reality for you to see"), the browser is one of many integrated clients in the University of Aizu Spatial Media Group's multi modal groupware suite and interoperates seamlessly with them. VR4U2C can be used interactively to explore and examine detailed multidimensional, virtual environments (photo realistic or otherwise) using a computer and conventional input devices as well ashore exotic interfaces| including speaker array spatial audio displays, mobile phones, and swivel chairs with servomotor control. Through a unique multi node dynamic traversal technique, VR4U2Cprovides an elegant solution to the problem of interactive stereo scopic display of qtvr imagery.
[mcohen-02:2003]Ashuboda Marasinghe, Stephen Lambacher, William Martens, Michael Cohen, Charith Giragama, Susantha Herath, and Garry Molholt. Universal Perceptual Attributes for Perception of American English Vowels by English and Japanese Native Speakers and Implications to Language Typology. Journal of Universal Languages, 4(2):117-145, 2003.

A universal perceptual space for 10American English vowel sounds was derived for two groups of listeners| a group of native speakers of English and a group of native speakers of Japanese. Subsets of these two groups made ratings on 12 bipolar adjective scales for the same set of sounds, each of the two groups using anchoring adjectives taken from their native language. Although there was no evidence of any difference between the two groups in their INDSCAL-derived perceptual dimensions for these vowel sounds, the adjectives were used differently in describing those same perceptual dimensions by the two groups. Though a few of the adjectives were used to describe similar perceptual variations, language typological implications of this investigation is that caution be exercised in generalizing semantic differential ratings obtained in one language, especially when those ratings are intended to aid in the interpretation of data from listeners speaking a different native language. Keywords: universal perceptual space, language typology, and semantic differential analysis.
[mcohen-03:2003]Uresh Chanaka Duminduwardena, Kazuya Adachi, Owen Noel Newton Fernando, Makoto Kawaguchi, and Michael Cohen. Narrow casting Operations for Multi present Chatspace Avatars in Collaborative Virtual Environments. 3D Forum: J. of Three Dimensional Images, 18(1):129-135, 2004.

Our group is exploring interactive multi- and hypermedia, especially regarding virtual andmixed realitygroupwaresystems. The apparentparadoxes ofmulti-presence, having avatars inmultiple places or spaces simultaneously, are resolvable byan "autofocus" feature, which uses reciprocity, logical exchangability of source and sink, to project overlaid soundscapes and simulate precedence effect to consolidate the audio display. Our goal is to develop user interfaces to control source!sink transmission in synchronous groupware (like teleconferences, chatspace, virtual concerts, etc.) We have developed two interfaces for narrow casting (selection) functions in a collaborative virtual environments cves: for a workstation-style wimp (windows/icon/menu/pointer) and gui (graphical user interface), and for a networked mobile device, a third-generation mobile phone. The interfaces are integrated with other cve clients, interoperating with a heterogeneous groupware suite. The narrowcasting operations comprise an idiom for selective attention or presence. Keywords and Phrases: audibility permissions and protocols, cscw (computer-supported collaborative work), chatspace, graphical (binaural directional) mixing console, groupware, mobile computing, narrowcasting functions, soundscape superposition, spatial sound, teleconferencing.
[mcohen-04:2003]Chandrajith Ashuboda Marasinghe, William L. Martens, Stephen Lambacher, Michael Cohen, Susantha Herath, and Ajith P. Madurapperuma. Relating to a Common Perceptual Space for American English Vowels to Multilingual Verbal Attributes. 3D Forum: J. of Three Dimensional Images, 17(4):144-149, 2003.

A common perceptual space for 10 American English vowel sounds was derived for two groups of listeners, a group of native speakers of Japanese, and a group of native speakers of Sinhala, a language of Sri Lanka. The stimuli used in the experiment were the ten vowel sounds synthesized using the often utilized formant frequency values published by Perton and Barney in 1952. Subsets of these two groups made ratings on 12 bipolar adjective scales for the same set of sounds, each of the two groups using anchoring adjectives taken from their native language. Though there was no evidence of any difference between the two groups in their INDSCAL-derived perceptual dimensions for these vowel sounds, the adjectives were used differently in describing those same perceptual dimensions by the two groups. The results of semantic differential analysis support the conclusion the two groups' ratings on 12 bipolar adjective scales related somewhat differently to the dimensions of their shared perceptual space, though a few of the adjectives were used to describe similar perceptual variations, one implication of this investigation is that caution be exercised in generalizing semantic differential ratings obtained in one language, especially when those ratings are intended to aid in the interpretation of data from listeners speaking a different native language. Keywords: vowel perception, perceptual space, semantic differential analysis.

Refereed Proceeding Papers

[j-huang-02:2003]J. Huang. Spatial auditory processing for a hearing robot. In In Proc. IEEE Int. Conf. Multimedia and Expo, Aug. 2002.
[j-huang-03:2003]J. Huang, K. Kume, A. Saji, M. Nishihashi, T. Watanabe, and W. Martens. A multimodal tele-robot as a mobile intelligent human interface, | spatial processing and 3-D sound human interface. In In proc. Int. Conf. Distributed Multimedia Systems, Sep. 2002.
[j-huang-04:2003]K. Yoshida and J. Huang. Competition between different primary cues for sound component organization in human audition. In In proc. Fifth Int. Conf. Humans and Computers, Oct. 2002.
[j-huang-05:2003]J.Huang,Ohtake, andC.Zhao. Robot position calibration using colored rectangle signboards. In In Proc. Third Int. Conf. Computer and Information Technology, Oct. 2002.
[j-huang-06:2003]J. Huang, K. Kume, A. Saji, M. Nishihashi, T. Watanabe, and W. Martens. Robotic spatial sound localization and its 3-D sound human interface. In In Proc. First Int. Sym. Cyber Worlds, Theories and Practices, Nov. 2002.
[j-huang-07:2003]H. Sato and J. Huang. Investigating the quantitative factors for sound integration and segregation in human audition, | harmonicity, frequency distance and common frequency modulation. In In Proc. 9th Australian Int. Conf. Speech Science and Technology, Dec. 2002.
[mcohen-05:2003]Owen Noel Newton Fernando, Kazuya Adachi, and Michael Cohen. Phantom sources for separation of listening and viewing positions for multipresent avatars in narrowcasting collaborative virtual environments. In Proc. MNSA: Int. Wkshp. on Multimedia Network Systems and Applications (in conjunction with ICDCS:24th Int.Conf.Our group is exploring interactiv emultimedia, especially for virtual and mixed reality groupware systems. The apparent paradoxes of multipresence, having avatars in multiple places or spaces simultaneously, are resolvable by an "autofocus" feature, which uses reciprocity, logical exchangability of source and sink, to project overlaid soundscapes and simulate precedence effect to consolidate the audio display. This paper reviews an interface to control source!sink transmissions in synchronous groupware (like teleconferences, chatspaces, virtual concerts, etc.), supporting spatial audiom ultipresence with narrowcasting functions in a collaborative virtual environment, implemented in a Java3d application, "Multiplicity." The interface allows interactation with other clients in our groupware suite (including mobile phone, stereographic panoramic browser, and multi-speaker systems). "Phantom sources" are used to control superposition of soundscapes relative to a selected viewpoint. Relative displacement from sources!sinks can be used to display phantom sources from alternate locations, exocentrically visibly and endocentrically auditorilly. A extra feature of the phantom source displacement is the accommodation of a rotatable speaker axis (including median plane arrangement). on Distributed Computing Systems), Hachioji, Tokyo: Tokyo University of Technology, March 2004.

[mcohen-06:2003]Owen Noel Newton Fernando, Kazuya Adachi, Uresh Chanaka Duminduwardena, Makoto Kawaguchi, and Michael Cohen. Audio Narrowcasting for Multipresent Avatars on Workstations and Mobile Phones. In Proc. ICAT2003: Thirteenth Int. Conf. on Artificial Real-ityandTelexistence, pages 106-113, Tokyo: Keio University, December 2003. VRSJ (Virtual Reality Society of Japan).

Our group is exploring interactive multi- and hypermedia, especially applied to virtual and mixed reality groupware systems. The apparent paradoxes of multipresence, having avatars in multiple places or spaces simultaneously, are resolvable by an "anycast" or "autofocus" feature to project overlaid soundscapes and simulate the precedence effect to consolidate the audio display. Our goal is to develop user interfaces to control source!sink transmission in synchronous groupware (like teleconferences, chatspaces, virtual concerts, etc.). We have developed two interfaces for narrowcasting (selection) functions in collaborative virtual environments (cves): for a workstation-style wimp (windows/icon/menu/pointer) and gui (graphical user interface), and for a networked mobile device, a 2.5g-generation mobile phone. The interfaces are integrated with other cve clients, interoperating with a heterogeneous groupware suite, including stereographic panoramic browsers and spatial audio backends and speaker arrays. The narrowcasting operations comprise an idiom for selective attention, presence, and privacy|an infrastructure for rich conferencing capability.
[mcohen-07:2003]Uresh Chanaka Duminduwardena, Kazuya Adachi, Owen Noel Newton Fernando, and Michael Cohen. Narrowcasting Operations for Multipresent Chatspace Avatars in Collaborative Virtual Environments, Part I. In Proc. HC-2003: Sixth Int. Conf. on Human and Computer, pages 14-19, Aizu-Wakamatsu, August 2003.
[mcohen-08:2003]Makoto Kawaguchi and Michael Cohen. Narrowcasting Operations for Multipresent Chatspace Avatars in Collaborative Virtual Environments, Part II. In Proc. HC2003: Sixth Int. Conf. onHumanand Computer, pages 20-23, Aizu-Wakamatsu, August 2003.
[mcohen-09:2003]Michael Cohen and Makoto Kawaguchi. Mobile Control in Cyberspace of Image-based & Computer Graphic Scenes and Spatial AudioUsingStereoQTVRandJava3D. InEoinBrazilandBarbaraShinnCunningham, editors, Proc. ICAD: Int. Conf. on Auditory Display, pages 136-139, Boston, July 2003.

We have developed an interface for narrowcasting (selection) functions for a networked mobile device deployed in a collaborative virtual environment (cve). Featuring a variable number of icons in a `2.5d' application, the interface can be used to control motion, sensitivity, and audibility of avatars in a teleconference or chatspace. The interface is integrated with other cve clients througha `servent' (server/clienthybrid)http$tcp/ipgateway,and interoperates with a heterogeneous groupware suite to interact with other clients, including stereographic panoramic browsers and spatial audio backends and speaker arrays. Novel features include mnemonic conferencing selection function keypad operations, multiply encoded graphical display of such non-mutually exclusive attributes, and explicit multipresence features. Keywords and Phrases: audibility permissions and protocols, cscw (computersupported collaborative work), graphical (binaural directional) mixing console, groupware, mobile computing, narrowcasting functions, soundscape superposition, solid user interface, spatial sound, teleconferencing.
[mcohen-10:2003]Wenxi Chen, Michael Cohen, and Daming Wei. A Cordless Sensor for Ubiquitous Health Monitoring. In Proc. 42nd Japan Medical Engineering Conf., Sapporo, June 2003.
[sugiyama-01:2003]Yamato Wada and Masahide Sugiyama. Time Alignment for Scenario and Sounds with Voice, Music and BGM. In Proc. of EuroSpeech2003, page ThA32p.2, Sep. 2003.

Unrefereed Papers

[sugiyama-02:2003]Tomokazu Muto and Masahide Sugiyama. EAEcient Voice Decomposition Method under Time Constraint. In Technical Report of Speech Processing, pages 49-54, Japan, May 2003. ASJ.
[sugiyama-03:2003]Yoko Nishizawa and Masahide Sugiyama. Correspondence between Sound and Text for Caption Generation. In Technical Report of Speech Processing, pages 7-12. IEICE/ASJ, IEICE/ASJ, Jan. 2004.
[sugiyama-04:2003]Tomoko Okamoto and Masahide Sugiyama. Video Caption Player 2.4 {Displaying Caption for Multiple Speakers-. In Technical Report of HI Research Committee, pages 1-6. HI, IPSJ, Mar. 2004.
[sugiyama-05:2003]Shinichi Takeuchi and Masahide Sugiyama. Detection of Speech/Music. In Proc. of ECEI2003, pages 2I-26, Aug. 2003.
[sugiyama-06:2003]Yoko Nishizawa and Masahide Sugiyama. Correspondence between Sound and Text for Caption Generation. In Proc. of 1st IPSJ Tohoku Regional Workshop, pages B-2-3. IPSJ Tohoku-Region, Nov.2003.
[sugiyama-07:2003]Masafumi Kurita and Masahide Sugiyama. Comparison between VQ and GMM for Laughter Detection. In Proc. of 1st IPSJ Tohoku Regional Workshop, pages B-2-4. IPSJ Tohoku-Region, Nov.2003.
[sugiyama-08:2003]Shin'ichi Takeuchi and Masahide Sugiyama. Voice/Music detectionmethodusingshorttime information. InProc. of 1st IPSJTohoku Regional Workshop, pages B-2-2. IPSJ Tohoku-Region, Nov. 2003.
[sugiyama-09:2003]Yoko Nishizawa, Yuuki Mori, and Masahide Sugiyama. Automatic Correspondence between Speech andText Using Speech Language Information and Speech Recognition Results. In Proc. of 5th IPSJ Tohoku Regional Workshop, pages A3-4. IPSJ Tohoku-Region, Mar. 2004.
[sugiyama-10:2003]Shunsuke Kita and Masahide Sugiyama. Paragraph CorrespondenceofRakugoTexts. InProc. of 5th IPSJTohoku Regional Workshop. IPSJ Tohoku-Region, Mar. 2004.

Chapters in Book

[j-huang-08:2003]S. Ding and J Huang. Knowledge-Based Intelligent Information and Engineering System, Lecture Notes in Artificial Intelligence 2773, chapter Recursive approach for real-time blind source separation of acoustic signals. Springer, 2003.

Grants

[sugiyama-11:2003]Masahide Sugiyama. Fukushima Prefectural Foundation for Advancement of Science and Education, 2003.

Academic Activities

[mcohen-12:2003]Michael Cohen, 2004.

Program Committee, ICAD: 10th Meeting of Int. Conf. on Auditory Displays
[mcohen-13:2003]Michael Cohen, 2004.

Program Committee, MNSA: Sixth Int. Wkshp. on Multimedia Network Systems and Applications
[mcohen-14:2003]Michael Cohen, 2004.

Program Committee, ICAT: 13th Int. Conf. on Artificial Reality and Telexistence
[sugiyama-12:2003]Masahide Sugiyama, Apr. 2003.

Associate Editor of ED, IEICE
[sugiyama-13:2003]Masahide Sugiyama, 2003.

Editor of Acoustic Technology Series, ASJ
[sugiyama-14:2003]Masahide Sugiyama, 2003.

member of Council, ASJ
[sugiyama-15:2003]Masahide Sugiyama, 2003.

member of Tohoku Reginal Board, ASJ
[sugiyama-16:2003]Masahide Sugiyama, Mar. 2004.

Session Chairman of Speech B in Spring Conference, ASJ

Ph.D and Other Theses

[j-huang-09:2003]Yuya Futamura. Graduation Thesis: Influence of Echoes and Reverberations on Perceptual Organization of Sound in HumanAudition, University of Aizu, 2003.

Thesis Advisor: Huang, J.

[j-huang-10:2003]Mina Kawauchi. Graduation Thesis: Perceptual Separation of Overlapped Vowels, University of Aizu, 2003.

Thesis Advisor: Huang, J.

[j-huang-11:2003]Masahide Sato. Graduation Thesis: Telerobot as aWEBServer, University of Aizu, 2003.

Thesis Advisor: Huang, J.

[j-huang-12:2003]Nobuhiko Saito. Graduation Thesis: Cancellation of Robot's Own Sound from Its Input Signals, University of Aizu, 2003.

Thesis Advisor: Huang, J.

[j-huang-13:2003]Yoshiko Inoue. Graduation Thesis: The Torso Effect for HRTF based 3-D Sound Reproduction, University of Aizu, 2003.

Thesis Advisor: Huang, J.

[j-huang-14:2003]Shinya Tsunematsu. Graduation Thesis: Perceptual Evaluation of the Effect of Torso Reflection on Directional HRTFs, University of

Thesis Advisor: Huang, J.

[j-huang-15:2003]ShigetoshiToyomisaka. Graduation Thesis: ECGSignal Quality Evaluation and Electrode Fail Detection for Health Care System, University of Aizu, 2003.

Thesis Advisor: Huang, J.

[j-huang-16:2003]Hiromi Hatano. Graduation Thesis: Locamotion Control of AIBOs for RoboCup, University of Aizu, 2003.

Thesis Advisor: Huang, J.

[j-huang-17:2003]Takeshi Fujiwara. Graduation Thesis: Recognition of Black-and-white Soccer Ball by AIBOs, University of Aizu, 2003.

Thesis Advisor: Huang, J.

[j-huang-18:2003]YoshiyukiToyoda. Master Thesis: Environmental Sound Recognition by the Instantaneous Spectrum Combined with the Time Pattern of Power, University of Aizu, 2003.

Thesis Advisor: Huang, J.

[j-huang-19:2003]Akira Saji. Master Thesis: 3-D Sound Reproduction by HRTFs Combined with the Amplitude Panning Method, University of Aizu, 2003.

Thesis Advisor: Huang, J.

[mcohen-15:2003]Masato Fukuoka. Developing a Dancing Music Sound System Using RSS-10 and PSFC, University of Aizu, 2003-4.

Thesis Advisor: Michael Cohen with Masahiro Sasaki

[mcohen-16:2003]Ryutaro Higuchi. MIDI File Based Virtual Sound Source Posi- tioning Using RSS-10, University of Aizu, 2003-4.

Thesis Advisor: Michael Cohen with Masahiro Sasaki

[mcohen-17:2003]Etsuko Nemoto. µVR4U2C: A Mobile Stereographic Panorama Browser in a Collaborative Virtual Environment, University of Aizu, 2003-4.

Thesis Advisor: Michael Cohen with Alam Bolhassan

[mcohen-18:2003]Kazuya Adachi. Narrowcasting and Autofocus Function in CVEs with Multipresence, University of Aizu, 2003-4.

Thesis Advisor: Michael Cohen with Newton Fernando

[mcohen-19:2003] Shota Nakayama. Simulation of VR4U2C deployed in Schaire Internet Chair, University of Aizu, 2003-4.

Thesis Advisor: Michael Cohen with Alam Bolhassan

[mcohen-20:2003]Shuuhei Ishikawa. Extending iCon Using Way-Finding Operation for Dynamic Control, University of Aizu, 2003-4.

Thesis Advisor: Michael Cohen with Makoto Kawaguchi

[mcohen-21:2003]Kazuhiko Sawahata. ALAN Concento CVE Client, University of Aizu, 2003-4.

Thesis Advisor: Michael Cohen with Uresh Duminduwardena

[mcohen-22:2003]Yu Takaya. Shepherd Tones: a panning sound illusion, University of Aizu, 2003-4.

Thesis Advisor: Michael Cohen

[mcohen-23:2003]Takeshi Masumoto. Musically Useful Range of Chorus Depth and Rate, University of Aizu, 2003-4.

Thesis Advisor: Michael Cohen with William Martens

[mcohen-24:2003]Yousuke Suzuki. Development and Evaluation of a Modulation Effects Processor for Natural Sound of Vibrato, University of Aizu, 2003-4.

Thesis Advisor: Michael Cohen with William Martens

[mcohen-25:2003]Hisao Kaminaga. The Meaning of Thickness on Musical Sound, University of Aizu, 2003-4.

Thesis Advisor: Michael Cohen with William Martens

[mcohen-26:2003]Hitoshi Kojima. Automatic Generation of Natural Sounding Vibrato in Guitar Effects Processing, University of Aizu, 2003-4.

Thesis Advisor: Michael Cohen with William Martens

[sugiyama-17:2003]Tomoko Okamoto. Graduation Thesis: Development of Video Caption Player 2.4, University of Aizu, 2003.

Thesis Advisor: Masahide Sugiyama

[sugiyama-18:2003]Shunsuke Kita. Graduation Thesis: Correspondence between Rakugo Texts, University of Aizu, 2003.

Thesis Advisor: Masahide Sugiyama

[sugiyama-19:2003]Thesis Advisor: Masahide Sugiyama Yuuki Mori. Graduation Thesis: Correspondence between Rakugo Speech, University of Aizu, 2003.

Thesis Advisor: Masahide Sugiyama

[sugiyama-20:2003]Yutaka Igarashi. Graduation Thesis: Design of Laugh Machine, University of Aizu, 2003.

Thesis Advisor: Masahide Sugiyama

[sugiyama-21:2003]Nagisa Ujiie. Graduation Thesis: Generation of Laughters, University of Aizu, 2003.

Thesis Advisor: Masahide Sugiyama

[sugiyama-22:2003]Masafumi Kurita. Master Thesis: Automatic Laughter Detection for Video Caption System, University of Aizu, 2003.

Thesis Advisor: Masahide Sugiyama