Professor |
Associate Professor |
Visiting Researcher |
Each member of the Human Interface Laboratory has own interests and does independent research activities: Prof. Masahide Sugiyama:
Prof. Jie Huang:
|
[j-huang-01:2009] |
A. Saji, K. Tanno, and J. Huang. Reproduction 3-D Sound by Measuring
and Construction of HRTF with Room Reverberation. In AES 127th
Convention. AES, Oct. 2009. |
In this paper, we proposed a new method using HRTFs that contain room
reverberations(R-HRTF). The reverberation is not added to the dry sound source
separated with HRTF but contained at their measured process in the HRTFs. We
measured the HRTFs in a real reverberant environment for directions of azimuth 0,
45, 90, 135 (left side) and elevation from 0 to 90 (step of 10 degrees) degrees then constructed
a 3D sound system with the measured R-HRTF with headphones, examine
if the sound reality is improved. As a result, we succeed to create 3D spatial sound
system with more reality compared with traditional HRTFs sound system by signal
processing. |
|
[j-huang-02:2009] |
K. Tanno, A. Saji, H. Li, T. Katsumata, and J. Huang. Reconstruction
and Evaluation of Dichotic Room Reverberation for 3-D Sound Generation.
In AES 127th Convention. AES, Oct. 2009. |
Artificial reverberation is often used to increase reality and prevent the in-the-head
localization in a headphone based 3-D sound system. In traditional method, diotic
reverberations were used. In this research, we measured impulse responses of some
rooms by Four Point Microphone method, and calculated the sound intensity vectors
by Sound Intensity method. From the sound intensity vectors, we obtained the image
sound sources. Dichotic reverberation was reconstructed by the estimated image sound
sources. Comparison experiments were conducted for 3 kinds of reverberations, i.e.,
diotic reverberations, dichotic reverberations and dichotic reverberations added with
Head-Related Transfer Functions. From the results, we could clarify the 3-D sounds
reconstructed by dichotic reverberations with Head-Related Transfer Functions have
more spatial extension than other methods. |
|
[markov-01:2009] |
D. Vazhenina and K. Markov. Overview of the Current Russian Speech
Recognition Technology. In Proc. 12th. International Conference on Humans
and Computers, pages 169–173, Dec. 2009. |
In this paper, we present a review of the latest developments in Russian speech recognition
research. Although the underlying speech technology is mostly language independent,
differences between languages with respect to their structure and grammar
have substantial effect on the speech recognition systems performance. Russian language
has complicated word formation system which is characterized by high degree of
inflection and unrigidness of the word order. This greatly reduces the predictive power
of the conventional language models and consequently increases the error rate. Current
statistical approach to building speech recognition systems requires large amount
of both speech and text data. There exist several databases of Russian speech and
their descriptions are given in the paper. In addition, we analyze and compare several
speech recognition systems developed in Russia and Czech Republic and identify the
most promising directions for further research in Russian speech technology. |
|
[markov-02:2009] |
M. Sugiyama, A. Ronzhin, M. Prischepa, V. Budkov, K. Markov, and A. Karpov. Speech activity and speaker novelty detection methods for meeting
processing. In Proc. International Workshop on Sensing and Acting In
Ubiquitous Environments, SEACUBE 2009, Oct. 2009. |
Segmentation of multi-speaker meeting audio data recorded with several microphones
into speech/silence frames is one of the first tasks at development of the speaker
diarization system. Energy normalization techniques and signal correlation methods
are used in order to avoid the crosstalk problem, in which participant’s speech appears
on other participants ’microphones. A comparison of different types of microphones
and a configuration of the recording devices implemented inside the intelligent meeting
room are described. Special attention is paid to improvement of the novelty detection
performance of the on-line speaker diarization system. |
|
[markov-03:2009] |
K. Markov. Advanced Approaches to Speaker Diarization of Audio
Documents. In Proc. of The Second IEEE International Conference on Ubimedia
Computing 2009, 2009. |
Speaker diarization is the process of annotating an audio document with information
about the speaker identity of speech segments along with their start and end time.
Assuming that audio input consists of speech only or that non-speech segments have
been already identified by another method, the task of speaker diarization is to find
who spoke when. Since there is no prior information about the number of speakers,
the main approach is to apply segment clustering. According to the clustering algorithm
used, speaker diarization systems can be divided into two groups: 1) based on
agglomerative clustering, and 2) based on on-line clustering. |
|
[markov-04:2009] |
K. Markov. Structured Models Design for Improved Speech Recognition.
In Proc. 12th. International Conference on Humans and Computers,
pages 45–50, Dec. 2009. |
Bayesian Networks (BN) are an excellent tool which can efficiently and flexibly encode
any structure through their topology, but it soon turned out that it’s difficult to
build large systems because of DBN’s poor scalability. Our approach is to keep the
hierarchical structure of the traditional ASR systems and use different, small BNs to
model pdfs at different hierarchical levels independently. For example, at the lowest
level, we use the BN to represent the HMM state pdf. At the next (phonetic) model
level, we use the BN to factor the underlying pdf.We describe several examples of ASR
models built using this approach and show that consistent performance improvement
can be achieved in various tasks and settings |
[sugiyama-01:2009] |
K. Sugai and M. Sugiyama. Similarity Degree Search in Multiple
Time-Series. In Proc. of ASJ Fall Meeting, pages 1–R–25, Japan, Sep. 2009.
ASJ. |
[sugiyama-02:2009] |
K. Sugai and M. Sugiyama. Similar Segment Search in Multiple
Time-Series Using RDDS Clustering Techniques. In Proc. of ASJ Spring Meeting,
pages 2–5–3, Japan, Mar. 2009. ASJ. |
[sugiyama-06:2008] |
M. Kuwabara and M. Sugiyama. Noise Robustness Evaluation of
Audio Features in Segment Search. In Proc. of ASJ Spring Meeting, pages 3–6–
1, Japan, Mar. 2010. ASJ. |
[j-huang-03:2009] |
Jie Huang. Sound localization for robot navigation. IN-TECH, Vienna,
2009. |
[sugiyama-03:2009] |
Masahide Sugiyama, 2009. Reviewer of ASJ Transaction, ASJ |
[sugiyama-04:2009] |
Masahide Sugiyama, 2009. Reviewer of IEICE Transaction, IEICE |
[sugiyama-05:2009] |
Masahide Sugiyama, 2009. board member of IEEE Sendai Chapter, IEEE |
[j-huang-04:2009] |
Hiroaki Endou. Graduation Thesis: Individual difference analysis of
3-D sound perception with an horizontally arranged speaker system, University
of Aizu, 2009. Thesis Advisor: Huang, J |
[j-huang-05:2009] |
Yoshiyuki Morikawa. Graduation Thesis: Reconstruction of reverberant
environments by measured sound intensity vectors, University of Aizu,
2009. Thesis Advisor: Huang, J |
[j-huang-06:2009] |
Naoya Yunoue. Graduation Thesis: Frequency analysis for frontal
sound localization with near speakers, University of Aizu, 2009. Thesis Advisor: Huang, J |
[j-huang-07:2009] |
Kensuke Tanaka. Graduation Thesis: Improvement of 3-D sound
systems by vertical speaker arrays, University of Aizu, 2009. Thesis Advisor: Huang, J |
[j-huang-08:2009] |
Tomooki Otaka. Graduation Thesis: Environmental sound analysis
by instantaneous frequencies and filter banks, University of Aizu, 2009. Thesis Advisor: Huang, J |
[j-huang-09:2009] |
Tetsuya Watanabe. TGraduation Thesis: he learning effect of 3-D
sound perception with a horizontally arranged speaker system, University of
Aizu, 2009. Thesis Advisor: J. Huang and M. Guo |