An Extractive Text News Summarization: A Hybrid Optimization with Ensemble Learning Approach

Mohammad Reza Feizi Derakhshi; Estabraq Abdulreda Kadhim

doi:10.25195/ijci.v51i2.625

An Extractive Text News Summarization: A Hybrid Optimization with Ensemble Learning Approach

Authors

Mohammad Reza Feizi Derakhshi University of Tabriz
Estabraq Abdulreda Kadhim ComInSyS Lab, Department of Computer Engineering, University of Tabriz, Tabriz, Iran

DOI:

https://doi.org/10.25195/ijci.v51i2.625

Keywords:

Classification, ensemble learing, Extractive Text Summarization, Feature Selection, Hard Voting, Machine Learning, Natural language processing

Abstract

Automatic Text Summarization is a crucial feature for managing the ever-increasing volume of textual data. However, existing methods often struggle with feature identification for sentence importance, which leads to a lack of maintained narrative coherence and accuracy. In this proposed approach, the summarization process leverages the Chi-square Binary Cuckoo Search (Chi-BCS) method for feature selection, this optimizes text features enhance the summary content and utilizes insights from classification to ensure summaries are contextually relevant and concise. Feature selection aims to improve the performance of machine learning models by reducing the dimensionality of the input data and removing irrelevant or redundant features. Classification, on the other hand, contributes to better summarization by distilling lengthy or redundant content into key points, thereby enhancing both efficiency and accuracy. The proposed approach implements a model that leverages advanced Natural Language Processing and machine learning techniques for effective extractive summarization on both BBC and CNN/DailyMail datasets. Key features extracted from the text include Named Entity Recognition, Cue phrases, TF-IDF, Sentence position, sentiment analysis, etc. Various algorithms are employed to improve classification performance, such as Decision Trees, Support Vector Classifier, Gradient Boosting, Random Forest, K-Nearest Neighbors, and Logistic Regression. Among all the methods evaluated, the Random Forest and Ensemble Hard Voting approach achieved the highest F-score of 96.26 and 0.9322 respectively on the BBC and CNN/DailyMail dataset. In the text summary evaluation, the ensemble method also delivered exceptional results, with ROUGE-2 and ROUGE-L F1 scores reaching 0.799 and 0.818, respectively on BBC. While our ensemble model achieved to high score on ROUGE1 and ROUGE 2 reaching 0.275, 0.5017, respectively on CNN/DailyMail when compared with state of art highlighting the model's strong performance. These findings demonstrate that the proposed model is highly effective for both the classification and summarization of large-scale textual data.

Downloads

Download data is not yet available.

Downloads

Published

2025-10-09

Issue

Vol. 51 No. 2 (2025): Volume 51 Issue 2 Year 2025

Section

Articles

License

This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.

IJCI applies the Creative Commons Attribution (CC BY) license to articles. The author of the submitted paper for publication by IJCI has the CC BY license. Under this Open Access license, the author gives an agreement to any author to reuse the article in whole or part for any purpose, even for commercial purposes. Anyone may copy, distribute, or reuse the content as long as the author and source are properly cited. This facility helps in re-use and ensures that journal content is available for the needs of research.
If the manuscript contains photos, images, figures, tables, audio files, videos, etc., that the author or the co-authors do not own, IJCI will require the author to provide the journal with proof that the owner of that content has given the author written permission to use it, and the owner has approved that the CC BY license being applied to content. IJCI provides a form that the author can use to ask for permission from the owner. If the author does not have owner permission, IJCI will ask the author to remove that content and/or replace it with other content that the author owns or has such permission to use.
Many authors assume that if they previously published a paper through another publisher, they have the right to reuse that content in their PLOS paper, but that is not necessarily the case – it depends on the license that covers the other paper. The author must ascertain the rights he/she has of a specific license (a license that enables the author to use the content). The author must obtain written permission from the publisher to use the content in the IJCI paper. The author should not include any content in her/his IJCI paper without having the right to use it, and always give proper attribution.
The accompanying submitted data should be stated with licensing policies, the policies should not be more restrictive than CC BY.
IJCI has the right to remove photos, captures, images, figures, tables, illustrations, audio, and video files, from a paper before or after publication, if these contents were included in the author's paper without permission from the owner of the content.

An Extractive Text News Summarization: A Hybrid Optimization with Ensemble Learning Approach

Authors

DOI:

Keywords:

Abstract

Downloads

Downloads

Published

Issue

Section

License

Issn Journal

Current Issue

Information