Abstractive and extractive based YouTube transcript summarization: a hybrid approach
Abstract
The rapid advancement in the field of communication and ubiquitous access to computing has led to the proliferation of large amounts of video content on YouTube and other social media platforms. However, getting precise information from the video in concise textual manner remains a challenge. Different extractive and abstractive text summarization methods are prevalent in the literature. In this paper, classical extractive text summarization methods Luhn’s algorithm, TextRank algorithm and Keyword- based summarization are combined to develop a combined extractive (CE) method. To enhance its performance, bidirectional and auto-regressive transformers (BART) is investigated and integrated as a hybrid model. Further, we explore how Kmeans clustering algorithm can be used for text summarization in general and with the proposed hybrid approach for improvement in text summarization. Using CNN/DailyMail dataset, assessment of text summarization methods based on ROUGE scores and time taken for summary generation is carried out. Based on the ROUGE score, we observe that the proposed hybrid method - 0.2644 is better than traditional extractive summarization methods. The combination of hybrid method with K-means further improved the score to 0.3227. The time taken by them for summary generation are 138.09 and 142.16 seconds respectively. This work experimented with different classical and transformer-based text summarization techniques to explore the complementary aspects and the results obtained are comparable with that of existing models with less time for text summarization.
Keywords
Abstractive summarization; Extractive summarization; K-Means clustering algorithm; Natural language processing; Pre-trained models; ROUGE scores; Transformers
Full Text:
PDFDOI: http://doi.org/10.11591/ijeecs.v40.i3.pp1439-1452
Refbacks
- There are currently no refbacks.

This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.
Indonesian Journal of Electrical Engineering and Computer Science (IJEECS)
p-ISSN: 2502-4752, e-ISSN: 2502-4760
This journal is published by the Institute of Advanced Engineering and Science (IAES).