/ Shunji Mori / Professor
/ Hirobumi Nishida / Associate Professor
/ Tony Y.T. Chan / Assistant Professor
/ Yu Nakajima / Research Associate
First and foremost, the Image Processing Laboratory engages in research and development of image pattern recognition systems. More specifically, as can be seen from the background information and the recent research publications of the members of the laboratory, character recognition is our current focus. Related to the recent involvement of multimedia systems, character recognition has been noted by many researchers and engineers. On the other hand, character recognition techniques are generally divided into off-line and on-line methods. The former is typical in character recognition and aims at duplicating the human ability in recognition. However, on-line character recognition is also noted recently in connection with so-called pen computers. On-line methods provide very flexible, convenient, and natural human-interface. Historically speaking, these techniques have been developed separately. However, both techniques can be developed together in principle. The common approach makes possible recognition flexibility in such a way that the usual constraints being imposed on the on-line techniques can be removed. For example, writing order and number of strokes constituting a character are typical of such constraints.
In this respect, one of our members, Prof. Nishida, is a world-class top level researcher and engineer. He has two regular papers in IEEE Trans. on Pattern Analysis and Machine Intelligence, which is one of the world-class, authoritative professional journals. Considering his young age, this is amazing. An OCR group in which Prof. Nishida played an important role developed a very powerful OCR package when he worked for Ricoh Research and Development Center. One of the members of the OCR group was Mr. Nakajima, who has joined us as a research associate. He is well-acquainted with different computer systems and also works as an instructor in programming. The system developed is called filtering in character recognition, which makes the packages very fast without being hardware-specific. Prof. Nishida and the OCR group were awarded the Excellence in Programming Prize from the Ministry of Post and Telecommunications. The basic configuration of the system was proposed by Prof. Mori when he was head of the Artificial Intelligence Research center of Ricoh.
Naturally, the essential point of the developed system lies in the excellent algorithm of character recognition, which is based on an algebraic approach to shape description. This approach was investigated as basic research by Prof. Mori when he worked for the Electrotechnical Laboratory of the Ministry of International Trade and Industry. Later the basic line was further developed, in theory and in practice, by Prof. Nishida. We think that this is one of the most elegant theories in character recognition. Because of its theoretical richness and high-level abstraction, the model has very broad applications in shape recognition in general. It is applicable to both off-line and on-line character recognition systems, for example. Further extension of the theory is in the application of the recognition of three-dimensional bodies, in which a piece of the surfaces must be represented effectively in terms of pattern recognition and global organization mechanism. In this regard, a set of primitive surfaces must be considered. Furthermore, in addition to basic recognition mechanisms in a bottom-up manner, rich knowledge concerning ``objects'' must be naturally employed. Some knowledge representational methods developed in the field of artificial intelligence research are being considered. Assistant Professor Chan is investigating unifying pattern recognition and artificial intelligence approaches based on models proposed by Prof. Lev Goldfarb of the University of New Brunswick, Prof. Nishida, and himself.
On the other hand, any abstract theory must have its raison d'etre in the real world. The proofs are in its utility. In this sense, experiments and/or computer simulation are essential research components. To that end, we need powerful facilities such as observation, display, interface, and calculation systems. To make such an integrated experimental system is also very important research work. In turn, it can be easily applicable to the top-down education system at the University of Aizu.
At any rate, the performance level of current image recognition systems is far below that of the human beings, who can recognize very distorted handwritten characters and degraded noisy printed characters easily without any heuristical learning. This ability is most mysterious. Therefore, from a long-term historical perspective, we are still only at the dawn of the day. Lots of work is available for ambitious young students, researchers, and engineers, in particular. We will pursue this very interesting research field along the lines mentioned.
Refereed Journal Papers
We present an algebraic approach to the inductive learning of structural models and automatic construction of shape prototypes for character recognition on the basis of the algebraic description of curve structure proposed by Nishida and Mori (1992). A class in the structural models is a set of shapes which can be transformed continuously to each other. We consider an algebraic representation of continuous transformation of components of the shape, and give specific properties satisfied by each component in the class. The generalization rules in the inductive learning are specified from the viewpoints of continuous transformation of components and relational structure among the components. The learning procedure generalizes a pair of classes into one class incrementally and hierarchically in terms of the generalization rules. We show experimental results on handwritten numerals.
Recognition of handwritten character strings is a challenging problem, because we need to cope with variations of shapes and touching/breaking of characters at the same time. A natural approach to recognizing such complex objects is as follows: The object is decomposed into segments, and meaningful partial shapes (shapes which are recognized as some characters) are constructed by merging segments locally. Then, a globally consistent interpretation of the object is determined from the combination of partial shapes. This approach can be referred to as a model-based split-and-merge method. Based on this idea, we present an algorithm for recognition and segmentation of character strings. We give systematic performance statistics by experiments using handwritten numerals. This algorithm can be applied to character strings composed of any number of characters and any types of touching or breaking, whether the number of constituent characters is known or unknown.
The prime difficulty in research and development of the handwritten character recognition systems is in the variety of shape deformations. In particular, throughout more than a quarter of a century of research, it is found that some qualitative features such as quasi-topological features (convexity and concavity), directional features, and singular points (branch points and crossings) are effective in coping with variations of shapes. On the basis of this observation, Nishida and Mori (1992) proposed a method for structural description of character shapes by few components with rich features. This method is clear and rigorous, can cope with various deformations, and has been shown to be powerful in practice. Furthermore, shape prototypes (structural models) can be constructed automatically from the training data (Nishida and Mori, 1993). However, in the analysis of directional features, the number of directions is fixed to four, and more directions such as 8 or 16 cannot be dealt with. For various applications of Nishida-Mori's method, we present a method for structural analysis and description of simple arcs or closed curves based on -directional features (m = 2, 3, 4, ...) and convex/concave features. On the other hand, software OCR systems without specialized hardware attract much attention recently. Based on the proposed method of structural analysis and description, we describe a software implementation of a handwritten character recognition system using multi-stage strategy.
We propose the Cross Section Sequence Graph which describes line images in a simple and well structured form. It is composed of regular regions called cross section sequences and singular regions. A cross section sequence is a sequence of cross sections, each of which is constructed as a pair of boundary points almost perpendicular to the direction of the line. The sequence corresponds to a straight or curved line segment. The remaining regions are extracted as singular regions,each of which corresponds to an end point region, corner, and so on. The cross section sequence graph is useful for many kinds of feature extraction, especially for skeltonization since a singular region can be analyzed from adjacent regular regions. Experimental results show that the skelton extracted from the cross section sequence graph is better than that of a pixel-wise skeltonization (thinning) in terms of both processing speed and the quality of the skelton.
Refereed Proceeding Papers
In this paper, we propose a simple and effective scheme to combine different algorithms, called ``recognition filter.'' The scheme requires a strong restrcition to each algorithm on substitution error rates and recogntion speed. To satisfy this restriction, we used a decision tree frame with model-based restriction. The number of substitution errors of the algorithm was zero for 13,000 of handwritten numerals which had used to construct the tree. For about 143,000 of other data sets, it was only 16 and all these can be considered as ``permissible'' ones or rather humans' mistakes. The recognition speed is about 260cps on a SPARC IPX, 28.5MIPS. This algorithm also has the property that every revision improves the performance without side effects. This algorithm was combined with another OCR algorithm. Then, the recognition speed became about 3 times faster and both recognition rates and substitution rates were improved compared to the latter algorithm alone. The combined system was loaded on a product of RICOH co. ltd.
In this paper, we present a model of the structural transformation of handwritten characters in terms of singular points and stroke continuation. On the basis of the model, we present an algorithm for generating deformed patterns from a given initial shape. Some potential applications in handwritten character recognition are discussed.
Nishida (1992) proposed a clear, rigorous, and powerful method for structural description of character shapes in terms of quasi-topological features (convexity and concavity), directional features, and singular points (branch points and crossings). Shapes are described by few components with rich features and shape prototypes (structural models) can be constructed automatically from the training data (Nishida, 1993). However, the number of directions is fixed to four, and more directions such as 8 or 16 cannot be dealt with. For various applications of Nishida's method, we present a method for structural analysis and description of simple arcs or closed curves based on (m = 2, 3, 4, ...) directional features and convex/concave features. Software OCR systems without specialized hardware attract much attention recently. Based on the method of structural analysis and description, we describe a software implementation of a handwritten character recognition system with multi-stage strategy.
The prime difficulty in research and development of handwritten character recognition systems is in the variety of shape deformations. The key to recognizing such complex objects as handwritten characters is in shape description which is robust against shape deformation and quantitative estimation of the amount of deformation. In this paper, on the basis of the structural description by Nishida (1992), we propose a shape matching algorithm and a method of analysis and description of shape transformation for handwritten characters. The object is described in terms of qualitative and global structure which is robust against deformation, and it is matched against the built-in models. On the basis of the correspondence of components between the object and the model, geometrical and statistical transformations are estimated, and the decision of recognition or rejection is made based on the estimations. Structural description and geometrical/statistical transform are integrated in a systematic way. Experimental results are shown for handwritten digit recognition and on-line handwriting recognition.
Reviewer of IEEE Transactions on Pattern Analysis and Machine Intelligence.
Program Co-Chair of the Second International Conference on Document Analysis and Recognition.
The handwritten character recognition system using algorithms by H. Nishida and Y. Nakajima was selected as for Excellent System Award in the Second IPTP (Institute for Posts and Telecommunications Policy, Ministry of Posts and Telecommunications, Japan) Competition on Character Recognition Technology.