G. Evangelopoulos, A. Zlatintsi, G. Skoumas, K. Rapantzikos, A. Potamianos, P. Maragos, Y. Avrithis |
Video event detection and summarization using audio, visual and text saliency |
IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp.3553-3556, 2009. |
ABSTRACT
|
Detection of perceptually important video events is formulated here on the basis of saliency models for the audio, visual and textual information conveyed in a video stream. Audio saliency is assessed by cues that quantify multifrequency waveform modulations, extracted through nonlinear operators and energy tracking. Visual saliency is measured through a spatiotemporal attention model driven by intensity, color and motion. Text saliency is extracted from part-of-speech tagging on the subtitles information available with most movie distributions. The various modality curves are integrated in a single attention curve, where the presence of an event may be signified in one or multiple domains. This multimodal saliency curve is the basis of a bottom-up video summarization algorithm, that refines results from unimodal or audiovisual-based skimming. The algorithm performs favorably for video summarization in terms of informativeness and enjoyability.
|
19 April , 2009 |
G. Evangelopoulos, A. Zlatintsi, G. Skoumas, K. Rapantzikos, A. Potamianos, P. Maragos, Y. Avrithis , "Video event detection and summarization using audio, visual and text saliency", IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp.3553-3556, 2009. |
[ PDF] [
BibTex] [
Print] [
Back] |