Javascript is disabled or not supported in your browser. JavaScript must be enabled in order for you to use WIKINDX fully. Enable JavaScript through your browser options then try again, otherwise, try using a different browser.

Sound Research WIKINDX

WIKINDX Resources

Velivelli, A., Ngo, C.-W., & Huang, T. S. (2003). Detection of documentary scene changes by audio-visual fusion. Lecture Notes in Computer Science, 2728, 227–238.
Added by: Mark Grimshaw-Aagaard (09/06/2005, 11:21)

Resource type: Journal Article
Published
BibTeX citation key: Velivelli2003
Email resource to friend
View all bibliographic details

Categories: Typologies/Taxonomies
Keywords: Semantic categorization
Creators: Huang, Ngo, Velivelli
Collection: Lecture Notes in Computer Science

Views: 3/1449

Abstract

The concept of a documentary scene was inferred from the audio-visual characteristics of certain documentary videos. It was observed that the amount of information from the visual component alone was not enough to convey a semantic context to most portions of these videos, but a joint observation of the visual component and the audio component conveyed a better semantic context. From the observations that we made on the video data, we generated an audio score and a visual score. We later generated a weighted audio-visual score within an interval and adaptively expanded or shrunk this interval until we found a local maximum score value. The video ultimately will be divided into a set of intervals that correspond to the documentary scenes in the video. After we obtained a set of documentary scenes, we made a check for any redundant detections.
Added by: Mark Grimshaw-Aagaard

Notes

An experiment in combining video and audio analysis for indexing scenes and shots in documentaries by semantic context.
Added by: Mark Grimshaw-Aagaard

Quotes

p. 228

One of the team's observations of documentaries is that usually "the visual pattern has a counterpart audio pattern."

An example they give is:

audio class: speech <-------- speech + siren <-------- speech
visual sequence: aircraft <---hanger [sic]/fire <-------- officer speaking

Added by: Mark Grimshaw-Aagaard (03/12/2004, 10:36)

Paraphrases

p. 231

For documentaries, they define 6 audio classes:

Speech
Speech + Music
Music
Speech + Noise
Noise
Silence

Added by: Mark Grimshaw-Aagaard (03/12/2004, 10:36)

Keywords: Semantic categorization