Sound Research WIKINDX
Xu, M., Duan, L.-Y., Cai, J., Chia, L.-T., Xu, C., & Tian, Q. (2004). HMM-based audio keyword generation. Lecture Notes in Computer Science, 3333, 556–574.
Added by: Mark Grimshaw-Aagaard (6/6/05, 10:48 AM)
|Resource type: Journal Article
BibTeX citation key: Xu2004
Email resource to friend
View all bibliographic details
|Categories: General, Typologies/Taxonomies
Keywords: Audio retrieval, Semantic categorization
Creators: Cai, Chia, Duan, Tian, Xu, Xu
Collection: Lecture Notes in Computer Science
Resources citing this (Bibliography: WIKINDX Master Bibliography)
With the exponential growth in the production creation of multimedia data, there is an increasing need for video semantic analysis. Audio, as a significant part of video, provides important cues to human perception when humans are browsing and understanding video contents. To detect semantic content by useful audio information, we introduce audio keywords which are sets of specific audio sounds related to semantic events. In our previous work, we designed a hierarchical Support Vector Machine (SVM) classifier for audio keyword identification. However, a weakness of our previous work is that audio signals are artificially segmented into 20 ms frames for frame-based SVM identification without any contextual information. In this paper, we propose a classification method based on Hidden Markov Modal (HMM) for audio keyword identification as an improved work instead of using hierarchical SVM classifier. Choosing HMM is motivated by the successful story of HMM in speech recognition. Unlike the frame-based SVM classification followed by major voting, our proposed HMM-based classifiers treat specific sound as a continuous time series data and employ hidden states transition to capture context information. In particular, we study how to find an effective HMM, i.e., determining topology, observation vectors and statistical parameters of HMM. We also compare different HMM structures with different hidden states, and adjust time series data with variable length. Experimental data includes 40 minutes basketball audio which comes from real-time sports games. Experimental results show that, for audio keyword generation, the proposed HMM-based method outperforms the previous hierarchical SVM.
Added by: Mark Grimshaw-Aagaard
Description of a method to find segments of a sports video via audio analysis -- using audio keywords (actual sound as opposed to text).
Not directly related to game audio but the analysis of sound for semantic classification provides some pointers to how humans derive semantic meaning from sound.
(In the abstract, "Hidden Markov Modal (HMM)" should actually be "Hidden Markov Model".)
See also (Cano, Koppenberger, le Groux, Ricard, Wack, & Herrera 2005; Khan, McLeod, & Hovy 2004)
Cano, P., Koppenberger, M., le Groux, S., Ricard, J., Wack, N., & Herrera, P. (2005). Nearest neighbor automatic sound annotation with a WordNet taxonomy. Journal of Intelligent Systems, 24(2/3), 99–111.
Khan, L., McLeod, D., & Hovy, E. (2004). Retrieval effectiveness of an ontology-based model for information selection. Very Large Data Bases, 13, 71–85.
Added by: Mark Grimshaw-Aagaard Last edited by: Mark Grimshaw-Aagaard