Sound Research WIKINDX |
Resource type: Journal Article BibTeX citation key: Velivelli2003 Email resource to friend View all bibliographic details |
Categories: General, Typologies/Taxonomies Keywords: Semantic categorization Creators: Huang, Ngo, Velivelli Collection: Lecture Notes in Computer Science |
Views: 12/826
|
Abstract |
The concept of a documentary scene was inferred from the audio-visual characteristics of certain documentary videos. It was observed that the amount of information from the visual component alone was not enough to convey a semantic context to most portions of these videos, but a joint observation of the visual component and the audio component conveyed a better semantic context. From the observations that we made on the video data, we generated an audio score and a visual score. We later generated a weighted audio-visual score within an interval and adaptively expanded or shrunk this interval until we found a local maximum score value. The video ultimately will be divided into a set of intervals that correspond to the documentary scenes in the video. After we obtained a set of documentary scenes, we made a check for any redundant detections.
Added by: Mark Grimshaw-Aagaard |
Notes |
An experiment in combining video and audio analysis for indexing scenes and shots in documentaries by semantic context.
Added by: Mark Grimshaw-Aagaard |
Quotes |
p.228
One of the team's observations of documentaries is that usually "the visual pattern has a counterpart audio pattern." An example they give is: audio class: speech <-------- speech + siren <-------- speech visual sequence: aircraft <---hanger [sic]/fire <-------- officer speaking Added by: Mark Grimshaw-Aagaard |
Paraphrases |
p.231
For documentaries, they define 6 audio classes:
Keywords: Semantic categorization |