Leveraging 3D convolutional networks for effective video feature extraction in video summarization
Abstract
Video feature extraction is pivotal in video processing, as it encompasses the extraction of pertinent information from video data. This process enables a more streamlined representation, analysis, and comprehension of video content. Given its advantages, feature extraction has become a crucial step in numerous video understanding tasks. This study investigates the generation of video representations utilizing three-dimensional (3D) convolutional neural networks (CNNs) for the task of video summarization. The feature vectors are extracted from the video sequences using pretrained two-dimensional (2D) networks such as GoogleNet and ResNet, along with 3D networks like 3D Convolutional Network (C3D) and Two-Stream Inflated 3D Convolutional Network (I3D). To assess the effectiveness of video representations, F1-scores are computed with the generated 2D and 3D video representations for chosen generic and query-focused video summarization techniques. The experimental results show that using feature vectors from 3D networks improves F1-scores, highlighting the effectiveness of 3D networks in video representation. It is demonstrated that 3D networks, unlike 2D ones, incorporate the time dimension to capture spatiotemporal features, providing better temporal processing and offering comprehensive video representation.
Keywords
Full Text:
PDFDOI: http://doi.org/10.11591/ijeecs.v37.i3.pp1616-1625
Refbacks
- There are currently no refbacks.
This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.
Indonesian Journal of Electrical Engineering and Computer Science (IJEECS)
p-ISSN: 2502-4752, e-ISSN: 2502-4760
This journal is published by the Institute of Advanced Engineering and Science (IAES) in collaboration with Intelektual Pustaka Media Utama (IPMU).