Information retrieval was the main direction in research conducted by the Software Engineering Lab members.
This year, the focus was on techniques to helping make retrieval more intelligent. The goal was to design methods improving the accuracy of retrieval. Results of our investigations and tests were presented at the 5th International Conference on Future Information Technology sponsored by IEEE, Busan, Korea, May 2010; the International Conference on Intelligent Semantic Web Services and Applications sponsored by ACM, Amman, Jordan, June 2010; and the 3rd International Conference on Human-centric Computing sponsored by IEEE, Cebu, Philippines, August 2010.
The aim of our work in the area of text mining was to improve Web search service. Students of our lab were involved in this research. Results of the research were presented at the 13th International Conference on Humans and Computers, Aizu, December 2010.
In 2009, the agreement between our university and Saint Petersburg Polytechnical State University (http://www.spbstu-eng.ru/) on scientific and educational cooperation was signed. The Software Engineering Lab plays a key pole in the collaboration activities. Within the framework of the joint research program between our universities, Evgeny Pyshkin, Associate Prof. of our partner university was working in Aizu a visiting researcher in April - June 2010. Results of joint research On Document Evaluation for Better Context-Aware Summary Generation were presented at the 2nd International Symposium on Aware Computing, sponsored by IEEE, Tainan, Taiwan.
During the period of his stay, Prof. Pyshkin taught the intensive course on ObjectOriented Software Engineering for our graduate students.
Exchange of Undergraduate Students
Our undergraduate student Mr. Hara visited Saint Petersburg State University, Russia in April 2010 and presented her paper at the XXXXI Conference on Control Processes and Stability. Russian undergraduate students Mr. Smirnov and Ms. Fedorova attended the HC 2010 Conference in December 2010. This exchange of students was done in accordance with our agreement with Saint Petersburg State University.
In July 2010, the honorary guest Prof. Fredric C Gey, University of California, Berkeley, USA visited our university and presented the talk on Varieties of Search: Three Radically Different Search Problems. Prof. Gey is a famous researcher in the area of information retrieval. He was the General Chair of ACM SIGIR Conference held at Berkeley in 1999.
Our lab with support of professors of our university played the key role in organizing the 13th International Conference on Humans and Computers held in December 2010 at the University of Aizu. Stating from this year, the conference got the status ACM in Cooperation. The conference proceedings will be included in the ACM digital library.
Two foreign master students joined the lab in autumn 2010. Ms. Julia Legotina from Saint-Petersburg State University, Russia won a scholarship of The Ministry of Education, Culture, Sports, Science and Technology (MEXT). Mr. Li from Chaoyang University of Technology, Taiwan enrolled in the dual-degree program (DDP). A DDP is a system where students can earn two degrees, from the home and the partner university through mutual recognition of credits attained at the universities, and the goal of the program includes fostering excellent human resources educated internationally, as well as strengthening relations between partner universities through concrete exchanges. The Memorandum of Understanding establishing the international dual degree program for students of our university and Chaoyang University of Technology was concluded in 2009.
Vitaly Klyuev and Ai Yokoyama. Web Query Expansion: A Strategy Utilizing Japanese WordNet. Journal of Convergence, 1(1):23-28, 2010.
Nowadays, searching is the most common task performed on the Web. However, the Web searching is especially difficult for beginners when they try to utilise keyword query language. Subsequently, beginners usually try to find information with ambiguous queries. Users then receive non-relevant information in response to such queries. Our goal is to make the search process more convenient for them. We assume that the top ranked pages returned are relevant to the user query. On these pages, we find the most important synonyms and hypernyms for the terms of the user query, utilising Japanese WordNet. We combine the aforementioned terms together and a new expanded query is then submitted to the search engine. These operations are done automatically by our prototype. This makes the Web searching process easier for beginners. The experimental results showed that our query expansion technique can improve search performance and has also advantages over what - traditional methods of searching.
Vitaly Klyuev and Vladimir Oleshchuk. Semantic retrieval: an approach to representing, searching and summarising text documents. Int. J. Information Technology, Communications and Convergence, 1(2):221-234, 2011.
The retrieval efficiency of the presently used systems cannot be significantly improved: 'bag of words' interpretation causes losing semantics of texts. We applied the functional approach to represent English text documents. It allows taking into account semantic relations between words when indexing documents and use ordinary English sentences as queries to a search engine. The proposed retrieval mechanisms return only highly relevant documents. They make it possible to generate content-aware summaries on-thefly. The presented examples illustrate the advantage of the discussed approach compared to the traditional key word search.
Min-Hsiang Li, Vitaly Klyuev, and Shih-Hung Wu. Multilingual Sentence Alignment from Wikipedia as Multilingual Comparable Corpora. In Proceedings of the 13th International Conference on Humans and Computers, pages 167-171, Aizu, Japan, December 2010. University of Aizu, ACM in Cooperation.
Bilingual dictionaries and the multilingual dictionaries are necessary resources for machine translation and cross language information retrieval. With the help of these dictionaries, an information retrieval system can find documents of similar content in different languages. Maintaining such dictionaries is an interesting research topic. Researchers can collect multilingual parallel corpora from the Internet and find the translation of new words. Therefore, the parallel corpora can help machine translation and cross language information retrieval. Sentence alignment of parallel corpora is a way to mine the necessary knowledge. But in the real world, a lot of the documents can be presented in comparable corpora. Therefore, we introduce the technique for the extraction of parallel sentences from Wikipedia as multilingual comparable corpora.
Evgeny Pyshkin and Vitaly Klyuev. On Document Evaluation for Better Context-Aware Summary Generation. In Proceedings of The 2nd International Symposium on Aware Computing, sponsored by IEEE, page 5 pp., Tainan, Taiwan, November 2010. IEEE.
Improving the quality of summaries created for the documents explored during informational search is the essential point in the interpretation of search engine results. In many cases summaries generated by the search engines are not indicative enough, therefore the user cannot quickly judge about the document relevancy. Having the WordNet ontology based weighting procedure proposed in earlier works as a starting point, the improved metric for better summary generation is introduced. This evaluation approach takes into account not only ontologically related weights for each term in the query, but also the relative distribution of terms regarding to their relevance to the document paragraphs.
Vitaly Klyuev and Vladimir Oleshchuk. A Novel Approach to Improve the Accuracy of Web Retrieval. In Proceedings of The 5th IEEE International Conference on Future Information Technology, page 5 pp., Busan, Korea, May 2010. IEEE, FTRA.
In this paper, we introduce a novel approach to improve the accuracy of Web retrieval. We utilize the WordNet and WordNet SenseRelate All Words Software as main tools to preserve the semantics of the sentences of documents and user queries. Nouns and verbs in the WordNet are organized in the tree hierarchies. The word meanings are presented by numbers that reference to the nodes on the semantic tree. The meaning of each word in the sentence is calculated when the sentence is analyzed. The goal is to put each noun and verb of the sentence on the right place on the tree. Taking this information into account, it is possible to solve the ambiguity problem for the query keywords and create the indicative summaries taking into account query words, and semantically related hypernyms and synonyms.
Ai Yokoyama and Vitaly Klyuev. Search Engine Query Expansion using Japanese WordNet. In Proceedings of The 3rd International Conference on Human-centric Computing, page 5 pp., Cebu, Philippines, August 2010. IEEE, FTRA, KITCS.
We assume that top ranked returned pages by a search engine are relevant to the user query. We find the most important terms on these pages. In addition to that, we get synonyms and hypernyms for the terms of the user query utilizing Japanese WordNet. We combine the aforementioned words together and this expanded query is then submitted to the search engine. The experimental results showed that our query expansion technique can improve the search performance and has several advantages.
Yulia Legotina and Vitaly Klyuev. Natural Language Processing Tool to Support Web Search. In Proceedings of the 13th International Conference on Humans and Computers, pages 172-176, Aizu, Japan, December 2010. University of Aizu, ACM in Cooperation.
the Internet is increasing rapidly, so that Internet users are often faced with the difficulty of determining exactly what to take into account. As a result, most of them do not look at more than one page retrieved by a search engine. Our goal is to make their search process more precise and thus more enjoyable. We are focusing on the intelligent retrieval which shifts the focus from the key-word language, understandable mainly by computers to natural language that better reflects the real user needs. For this purpose, we review the tools that already exist and introduce an algorithm that permits them to be used together.
Sayuri Ebisu and Vitaly Klyuev. An Approach to Generate Indicative Summaries for Japanese Documents. In Proceedings of the International Conference on Intelligent Semantic Web Services and Applications sponsored by ACM, pages 14-20, Jordan, June 2010. Isra University.
Finding appropriate information on the Internet is still difficult. From the end user point point of view, there are two key reasons for that. One is that the queries must be expressed in an artificial form such as a set of key words. The other is that the search results display only snippets containing the search terms. These snippets, however, are insufficient for determining result relevance as they do not really summarize the content of the document they represent. Search engines should thus generate indicative summaries that help the user understand the content of documents without downloading them. In this paper, we propose an approach that selects the most important sentences semantically relevant to the user query and derives the paragraphs including them. This approach is applicable to the Japanese language. Our experiments show the promizing results.
V. Klyuev, Apr. 2010.
Member, IEEE, ACM, IEICE
Arisa Takahashi. Graduation Thesis: Query Expansion Using WordNet and Stanford Parser, School of Computer Science and Engineering, March 2011.
Thesis Adviser: V. Klyuev
Ryo Ueno. Graduation Thesis: Semantic Search Engine Query Expansion using WordNet, School of Computer Science and Engineering, March 2011.
Thesis Adviser: V. Klyuev
Kazuhiro Wakisaka. Graduation Thesis: Query Expansion Using Wikipedia and Wikipedia Miner, School of Computer Science and Engineering, March 2011.
Thesis Adviser: V. Klyuev
Yoshiyuki Asano. Graduation Thesis: Query Expansion: Comparison between WordNet and Wikipedia for Information Retrieval, School of Computer Science and Engineering, March 2011.
Thesis Adviser: V. Klyuev