Professor |
Associate Professor |
Information Retrieval Direction
This year, the focus was on techniques related to Web search technologies applied to Japanese and Chinese. Results of our investigations and tests were presented at the NTCIR 6 Workshop. The aim of our work in the area of text mining was to improve Web search service. The retrieval efficiency of the presently used search tools cannot be made better: A \bag of words" interpretation causes loosing semantics of texts. The functional approach to present English texts in the memory of computers makes it possible to keep semantic relations between words and use ordinary English sentences as queries. The prototype of the system utilizing this approach was proposed. Our study was supported by a grant from the Japan Science and Technology Agency, and a grant from the University of Aizu Competitive Research Funding. Students of our lab were involved in this research. Results of our study were presented at the 7th IEEE Conference on Computer and Information Technology (CIT 2007), the 37th Annual IEEE Frontiers in Education Conference (FIE 2007), and at The 10th International Conference on Humans and Computers (HC 2007). |
Program for Leading Edge IT Specialists
|
International RelationsInternational Project on Text Mining
Exchange of undergraduate students
|
[vkluev-01:2007] |
M. Mozgovoy, V. Tuzov, and V. Klyuev. A Fast Semantic-Powered
Plagiarism Detection system. The Journal of Three Dimentional Images,
21(2):105 - 110, 2007. |
Plagiarism detection systems are known for years in the university community. However,
most of the existing detectors for the natural language texts use rather simple
comparison methods that make the instances of plagiarism easy to hide. The software,
designed for plagiarism detection in computer programs, utilizes far more advanced
techniques. We propose a method, which adds functionalities similar to tokenization
and tree matching, to the natural language texts-oriented detectors. This method requires
noticeable work to be applied in practice, but also makes use of the existing
software for parsing and word sense disambiguation. |
|
[vkluev-02:2007] |
V. Klyuev, T.Tsushimoto, and G. Nikishkov. Using a Web-Based System
to support Teaching Processes. International Journal of Information and
Communication Technology Education, 4(1):72 - 85, 2008. |
A platform-independent Java Web application named TSI (Teacher-Student Interaction)
that supports communication between an instructor, teaching assistants, and
students in a traditional on-campus course is presented in this article. Using the TSI,
the instructor and teaching assistants can handle most of the routine work: upload
student personal information, send students personal e-mails, etc. The system can
easily be installed and administered individually by an instructor inexperienced in
computers. It is as simple as a pen for students. Students can check their personal
data (scores and comments), download educational materials, etc. As part of the TSI,
a VBA application is used to analyze the course log les. This tool is helpful in understanding
individual and group studentsbehaviors. The TSI was successfully tested
during four years at the University of Aizu (Japan) in an environment where English
is one of the working languages and both students and professors are non-native
speakers of English. |
[vkluev-03:2007] |
T. Saito and V. Klyuev. Comparison of the Index Size of Google,
Yahoo!, Ask, and MSN. In Proceedings of the Tenth International conference
on Humans and Computers, pages 196 - 199. the University of Aizu, the
University of Aizu, Dec. 2007. |
We applied the technique proposed by M. Chency and M. Perry to evaluate the index
size of Google, Yahoo!, MSN ans Ask. |
|
[vkluev-04:2007] |
K. Sato and V. Klyuev. Finding Advertising Keywords on Japanese
Web Pages. In Proceedings of the Tenth International conference on Humans and Computers, pages 187 - 189. the University of Aizu, the University of
Aizu, Dec. 2007. |
We investigated several aspects for appropriate keyword extraction from Japanese
Web pages. |
|
[vkluev-05:2007] |
E. Nagaoka and V. Klyuev. An On-line Directory for Students. In
Proceedings of the Tenth International conference on Humans and Computers,
pages 190 - 191. the University of Aizu, the University of Aizu, Dec. 2007. |
We propose several improvements for the recenntly introduced tools such as Google
Notebook by Google and My Stuff by Ask3D. |
|
[vkluev-06:2007] |
V. Klyuev and V. Oleshchuk. Semantic Retrieval of Text Documents. In
7th IEEE International Conference on Computer and Information Technology
(CIT 2007), pages 189 - 193. IEEE, IEEE Computer Society, Oct. 2007. |
Nowadays, the Internet is the major source of information for millions of people.
There are many search tools available on the net but nding appropriate text information
is still difficult. The retrieval effciency of the presently used systems cannot
be significantly improved:@B ag of wordsinterpretation causes loosing semantics of
texts. We applied the functional approach to present English text documents in the
memory of computers. It allowed to keep semantic relations between words when
indexing documents and use ordinary English sentences as queries to submit to a
search engine. The proposed retrieval algorithm returns highly relevant documents.
The presented example illustrates the advantage of the discussed approach compared
to the traditional key word search. |
|
[vkluev-07:2007] |
M. Mozgovoy, S. Karakovsky, and V. Klyuev. Fast and Reliable Plagiarism
Detection System. In 37th Annual IEEE Frontiers in Education Con-
ference. IEEE, IEEE Computer Society, Oct. 2007. |
Plagiarism and similarity detection software is well-known in universities for years.
Despite the variety of methods and approaches used in plagiarism detection, the
typical trade-off between the speed and the reliability of the algorithm still remains.
We introduce a new two-step approach to plagiarism detection that combines high
algorithmic performance and the quality of pairwise file comparison. Our system uses
fast detection method to select suspicious files only, and then invokes precise (and
slower) algorithms to get reliable results. We show that the proposed method does not
noticeably reduce the quality of the pairwise comparison mechanism while providing
better speed characteristics. |
|
[vkluev-08:2007] |
V. Klyuev. OASIS at NTCIR-6: On-line Query Translation for Chinese-
Japanese Cross-Lingual Information Retrieval. In NTCIR Workshop 6 Meet-
ing on Evaluation of Information Access Technologies: Information Retrieval,
Question Answering and Cross-Lingual Information Access, pages 85 - 91, Tokyo, May 2007. National Institute of Informatics, National Institute of
Informatics. |
The aim of this study is to investigate the effciency of on-line translation systems
in managing the polysemy problem for the query terms, when systems automatically
translate them. This paper reports results of Chinese Japanese CLIR experiments
using on-line query translation techniques. Approaches to use English as a pilot language
and several on-line translation systems are introduced. They were tested using
NTCIR-3, 4, 5, and 6 collections. They can be helpful under certain circumstances. |
|
[vkluev-09:2007] |
M. Sampei and V. Klyuev. Finding Stop Words for Japanese. In
Proceedings of the Tenth International conference on Humans and Computers,
pages 192 - 195. the University of Aizu, the University of Aizu, Dec. 2007. |
In this study, we applied a statistical approach to find stop word for Japanese |
[vkluev-10:2007] |
V. Klyuev. Using English for Queries: An Approach to Implementing
an Intelligent Web Search. In IPSJ SIG-DD/FI, pages 47-51, Tokyo, Mar. 2008.
IPSJ, IPSJ. |
The retrieval efficiency of the presently used search tools cannot be signicantly improved:
A 'bag of words' interpretation causes loosing semantics of texts. The functional
approach to present English texts in the memory of computers makes it possible to keep
semantic relations between words and use ordinary English sentences as queries. The
prototype of the system utilizing this approach is presented. |
[vkluev-11:2007] |
V. Klyuev. A grant from the Japan Science and Technology Agency,
2007. |
[vkluev-12:2007] |
V. Klyuev. a grant from the University of Aizu Competitive Research
Funding, 2007. |
[vkluev-13:2007] |
V. Klyuev, Apr. 2007. Member, IEEE, ACM |
[vkluev-14:2007] |
Emu Nagaoka. Graduation Thesis: Comparing Online Directories for
Students, University of Aizu, 2008. |
Thesis Advisor: Klyuev, V |
|
[vkluev-15:2007] |
Keiuske Sato. Graduation Thesis: Finding Advertising Keywords on
Japanese Web Pages, University of Aizu, 2008. |
Thesis Advisor: Klyuev, V |
|
[vkluev-16:2007] |
Masashiro Sampei. Graduation Thesis: Finding Stop Words for
Japanese, University of Aizu, 2008. |
Thesis Advisor: Klyuev, V |
|
[vkluev-17:2007] |
Tatsuya Saito. Graduation Thesis: Comparison of the Index Size of
Google, Yahoo!, Ask, and MSN, University of Aizu, 2008. |
Thesis Advisor: Klyuev, V |