The Hearing Robot

Goal

The ultimate goal of this project will be the development of a multimedia interactive autonomous mobile robot. Within the first phase development, the robot will be able to localize and track sound sources in any complex environment, have the abilities of separating target sound from environment noise, and understand environment by sounds, e.g., human voice, sound of walk steps, siren and crash. The robot will be able to avoid obstacles and move to a destination autonomously. Visual and auditory interface will be provided.

Features

Multi-purpose The purpose of this project is to develop a multipurpose robot which can be used in several ways. For example, the robot can serve as a guard robot and also can be used as an agent robot which can attend a meeting instead of its user so that the user can get the remote visual scenes of the meeting room as well as the auditory scenes.
Multi-mode Considering a guard robot patrolling around our campus, we would need the robot be able to work in two different modes. One is the autonomous mode, the robot will automatically move around the campus and check for possible suspicious objects. The other one is the remote control mode. When the robot has found some suspicious objects, the robot will be controlled by its owner who will order the robot to approach the objects and check more details about the objects. However, recent researches on mobile robot were almost concentrated on the autonomous aspect. In this project, we will provide the robot the interactive mode as well as the autonomous mode.
Multi-sensor The multimodal robot must be capable of treating multimedia resources, especially sound media to complement with vision. Visual sensor is one of the most popular sensors used today for mobile robots. Since a robot generally looks at the external world from a camera, difficulties occur when a object does not exist in the visual field of the camera or when the lighting condition is poor. A robot cannot detect a nonvisual event which in many cases may, however, be accompanied by a sound emission. In these situations, the most useful information is given by audition. Audition is one of the most important senses used by humans and animals to recognize their environments. Although the spatial resolution of audition is relatively low compared with that of vision, the auditory system can complement and cooperative with vision systems. For example, sound localization can enable the robot to direct its camera to a sound source.

System

Base Platform: The robot is based on the LABO-3 platform of Applied AI Systems, Inc. The LABO-3 platform is a wheel-based mobile robot. It is equipped with infrared and tactile sensors for obstacle avoidance, and a micro-processor for steering control and sensor signal processing. It is driven by two-wheel differential steering.
Auditory Sensors: Four microphones are arranged in the surface of a sphere, where three microphones (x,y,z) in a same horizontal plane with a shape of regular triangle, and the fourth microphone (o) arranged in the center with a height so that the four microphones (o-xyz) form Cartesian coordinates (i.e. ox-oy-oz in right angles). The diameter of the sphere is about 30 cm.
Vision Sensor: A fixed directional CCD camera is mounted on the front of the robot. The camera is planned to be upgrade to a active camera system with a pan/tilt servo platform.
Communication: A radio bi-directional modem is used for host-robot communication. Four FM transmitters and one video transmitter are used for audio and video signal transmission respectively.
Audio4-USB: DSP-based data acquisition system with a Motorola DSP56307 (24bit fixed-point) processor. It has four channels audio input and output and connects to hte host PC via USB interface.
TriMedia DSP Board: PCI plug-in DSP board with a TriMedia TM1300 DSP processor (Philips) for general digital video, audio, graphics and telecommunications processing. It supports video and audio input/output.

Useful Auditory Functions

Sound Localization
Auditory Scene Analysis: Talking to other persons in a noisy room, walking in the street, the auditory system faces the problem of separating complex sound signals into different sound streams. In spite of the difficulties that the acoustic events produced by different sources may overlap in time and frequency domain and may have same major characteristics, the human auditory system can identify and separate the sources without effort. What methods are needed in the human auditory system? As a mathematic model, the task of separating multiple sources are discribed as of solving a multiple input and multiple output system by only knowing the output but without knowing the input and the system. Although there are now some algorithm to solve the problem based on the assumption that the inputs of different channels are mutually independent. However, this method needs much computation especially when the positions of sound sources and receivers are time variant. One big difference is this method can not separate the inputs from a same position. Obviously, the algorithm does not satify the requirment in the auditory system. In this project, we will give a computational model for the primary sound organization stage of the auditory system. Algorithms will be constructed based on the spatial cues and the discovered psychoacoustic cues, e.g. harmonics, time continuance, synchronized AM, FM and others sound features.
Sound Understanding: Sound understanding is important for an autonomous robot to recognize its environment. This is especially true in our research because we use the auditory system as the major part of the sensing system. Here, we will not focus on speech recognition which refer to the understanding of the meaning from a spoken language. Since it is the first step for sound understanding, we will restrict the target to some special sounds, e.g. sound of human voice, phone bell, door knocking, door open, walk step, siren, crash, and so on. Sound understanding will enables the robot to act in response to what is happening. For sound understanding, any existing machine learning technology can be used. Examples include neural networks, decision trees and decision rules. In general, neural networks are good at learning in changing environments, and decision trees (or rules) are good at selecting useful features for recognition. We are going to combine both symbolic and non-symbolic approaches, to make use of the advantages of both sides.

Advantages of Auditory Navigation

All-directional: Compared to vision, audition is all-directional. When a sound source emits energe the sound fills the air, and a sound receiver (microphone) receives the sound evergy from all directions. Some specialized cameras can also receive an image from all directions, but still have to scan the total area to locate a specific object. Audition mixes the signals into a one-dimensional time series, making it easier to locate any urgent or emergency situation.
No Necessary of Illumination: Audition requiring no illumination, enables a robot to work in darkness or low light.
Less Effected by Obstacles: Audition also is less effected by obstacles. So, a robot can perceive audition information from sources behind obstacles. One example is to localize a sound source outside of a room or around a corner. The robot will first localize the sound source in the area of the door/corner, and then travel to that point and listen again, finally locating the sound source.

Reference

Huang, J. and Supaongprapa, T. and Terakura, I and Wang, F. and Ohnishi, N. and Sugie, N., "A Model Based Sound Localization System and Its Application to Robot Navigation", Robotics and Autonomous Systems (Elsevier Science), Vol.27, No.4, pp.199-209, 1999.
Huang, J. and Ohnishi, N. and Sugie, N., "Building Ears for Robots: Sound Localization and Separation", Artificial Life and Robotics (Springer-Verlag), Vol.1, No.4, pp.157-163, 1997.

[Return to home page]

last modified: 16 Augest 2000 by Jie Huang (j-huang@u-aizu.ac.jp)