◆ Annual Review 2002

Image Processing Laboratory

Ryuichi Oka

Shunji Mori
Visiting Professor

Jintae Lee
Associate Professor

Konstantin Kolchin
Visiting Researcher

Tony Y. T. Chan
Assistant Professor

Yu Nakajima
Research Associate

The Image Processing Laboratory engages in research and development of image based pattern recognition including some areas of Artificial Intelligence and database organisation and retrieval. More specifically, as can be seen from the background information and the recent research publications of the members of the laboratory, multimedia recognition and retrieval including character recognition is our current focus. Related to the recent progress of the web, a huge amount of multimedia data without index becomes available to store in our PC's. However, no sophistcated methodology to manage such data has been developed so far. So that we are solicited to attach an index to each data. Our research aim is to develop algorithms to realize automatic annotation to real word data for integrated retrieval of multimedia information. The algorithms include self-organisation and transformation among representation of multimedia and feature extraction and recognition of real data. Real world data includes video, still image, speech, music, sound, and text each of which has not been indexed by labels. A software package for multimedia integration retrieval called CrossMediator was developed by the research group directed by Prof. R.Oka in ten-year project (1992-2002) RWC of METI (Japan). Some parts of CrossMediator have been in the commercial market by through a private company. Our laboratory will pursue to develop more sophisticated functions which might reveal a new generation of the Internet.

Referred Journal Papers
This paper describes some methods for recognizing human gestures from a time varying image captured by a single or multiple video cameras. Each method is suitable to recognize human gestures performed in a different situation. The situations include the case of a single person facing a camera and the case of multiple persons captured by an omni-view camera and so on. The paper describes an architecture to realize a real-time dialogue system consisting of speech recognition, task model, CG output and speech synthesis outpout modules which cooperate with gesture recognition module.
The even-odd parity problem is a tough one for neural networks to handle because they assume a ??nite dimensional vector space. Typically, the size of the neural network increases as the size of the problem increases. The triple parity problem is even tougher. In this paper, amethod is proposed for supervised and unsupervised learning to classify bit strings of arbitrary length in terms of their triple parity. The learner is modeled by two formal concepts, transformation system and stability optimization. Even though a small set of short examples were used in the training stage, all bit strings of any length were classified correctly in the online recognition stage. The proposed learner has successfully learned to devise away by means of metric calculations to classify bit strings of any length according to their triple parity. The system was able to acquire the concept of counting, dividing, and then taking the remainder, by autonomously evolving a set of string-editing rules along with their appropriate weights to solve the difficult problem.
Referred Proceeding Papers
This paper proposes a new method for writer verification based on so-called Continuous Dynamic Prpgramming. The method belongs to the category of content-free writer verification. This method uses an arbitrary partial inteval of the registrated sequence of strokes. The usage of the order of the strokes is different from the conventional methods of content independent methods.
This paper describes the RWC music dadabase whichwas developed under the RWC Program to widely provide researchers in this field.
A method is proposed for learning to classify vector objects. It combined the strategy of the Best Stepwise Feature Selection with a classifier of Euclidean nearest-neighbor. Time complexities for the various procedures were analysed. Each object was represented by a vector in a fixed D-dimensional Euclidean space. Objects were divided into training and test sets. Nineteen experiments were performed and their CPU times and accuracies reported. The proposed naive learner was found to be extremely fast with good error rates. It could be used as a baseline learning agent, in terms of CPU time and accuracy, against which other learning agents can be measured.
Accepted for publication
Unrefereed Papers
To be published
Academic Activities
Ph.D and Other Thesis
