An Extractive Text News Summarization: A Hybrid Optimization with Ensemble Learning Approach

Authors

  • Mohammad Reza Feizi Derakhshi University of Tabriz
  • Estabraq Abdulreda Kadhim ComInSyS Lab, Department of Computer Engineering, University of Tabriz, Tabriz, Iran

DOI:

https://doi.org/10.25195/ijci.v51i2.625

Keywords:

Classification, ensemble learing, Extractive Text Summarization, Feature Selection, Hard Voting, Machine Learning, Natural language processing

Abstract

Automatic Text Summarization is a crucial feature for managing the ever-increasing volume of textual data. However, existing methods often struggle with feature identification for sentence importance, which leads to a lack of maintained narrative coherence and accuracy. In this proposed approach, the summarization process leverages the Chi-square Binary Cuckoo Search (Chi-BCS) method for feature selection, this optimizes text features enhance the summary content and utilizes insights from classification to ensure summaries are contextually relevant and concise. Feature selection aims to improve the performance of machine learning models by reducing the dimensionality of the input data and removing irrelevant or redundant features. Classification, on the other hand, contributes to better summarization by distilling lengthy or redundant content into key points, thereby enhancing both efficiency and accuracy. The proposed approach implements a model that leverages advanced Natural Language Processing and machine learning techniques for effective extractive summarization on both BBC and CNN/DailyMail datasets. Key features extracted from the text include Named Entity Recognition, Cue phrases, TF-IDF, Sentence position, sentiment analysis, etc. Various algorithms are employed to improve classification performance, such as Decision Trees, Support Vector Classifier, Gradient Boosting, Random Forest, K-Nearest Neighbors, and Logistic Regression. Among all the methods evaluated, the Random Forest and Ensemble Hard Voting approach achieved the highest F-score of 96.26 and 0.9322 respectively on the BBC and CNN/DailyMail dataset. In the text summary evaluation, the ensemble method also delivered exceptional results, with ROUGE-2 and ROUGE-L F1 scores reaching 0.799 and 0.818, respectively on BBC. While our ensemble model achieved to high score on ROUGE1 and ROUGE 2 reaching 0.275, 0.5017, respectively on CNN/DailyMail when compared with state of art highlighting the model's strong performance. These findings demonstrate that the proposed model is highly effective for both the classification and summarization of large-scale textual data.

Downloads

Download data is not yet available.

Downloads

Published

2025-10-09