Sound Research WIKINDX

WIKINDX Resources

Velivelli, A., Ngo, C.-W., & Huang, T. S. (2003). Detection of documentary scene changes by audio-visual fusion. Lecture Notes in Computer Science, 2728, 227–238. 
Added by: Mark Grimshaw-Aagaard (6/9/05, 11:21 AM)   
Resource type: Journal Article
BibTeX citation key: Velivelli2003
Email resource to friend
View all bibliographic details
Categories: General, Typologies/Taxonomies
Keywords: Semantic categorization
Creators: Huang, Ngo, Velivelli
Collection: Lecture Notes in Computer Science
Views: 19/1007
Abstract
The concept of a documentary scene was inferred from the audio-visual characteristics of certain documentary videos. It was observed that the amount of information from the visual component alone was not enough to convey a semantic context to most portions of these videos, but a joint observation of the visual component and the audio component conveyed a better semantic context. From the observations that we made on the video data, we generated an audio score and a visual score. We later generated a weighted audio-visual score within an interval and adaptively expanded or shrunk this interval until we found a local maximum score value. The video ultimately will be divided into a set of intervals that correspond to the documentary scenes in the video. After we obtained a set of documentary scenes, we made a check for any redundant detections.
Added by: Mark Grimshaw-Aagaard  
Notes
An experiment in combining video and audio analysis for indexing scenes and shots in documentaries by semantic context.
Added by: Mark Grimshaw-Aagaard  
Quotes
p.228   One of the team's observations of documentaries is that usually "the visual pattern has a counterpart audio pattern."

An example they give is:

audio class: speech <-------- speech + siren <-------- speech
visual sequence: aircraft <---hanger [sic]/fire <-------- officer speaking   Added by: Mark Grimshaw-Aagaard
Paraphrases
p.231   For documentaries, they define 6 audio classes:

  • Speech
  • Speech + Music
  • Music
  • Speech + Noise
  • Noise
  • Silence
  Added by: Mark Grimshaw-Aagaard
Keywords:   Semantic categorization