Graph Neural Networks for Malware Classification: Comparing Graph-Structured and Sequence-Based Representations

Authors

  • Meer Twana Qadir Koya University
  • Saman Mirza Abdullah Koya University

DOI:

https://doi.org/10.25195/ijci.v52i1.680

Keywords:

malware detection, API, GAT, GCN, PE Files

Abstract

Malware detection is one of the most important cybersecurity issues because the traditional signature-based methods cannot resist polymorphic threats and obfuscated ones. This paper explores the dynamic API call sequences as behavioral characteristics and contrasts the two representation methods, integer-based feature encoding into the traditional machine learning models and graph-based models using Graph Convolutional Networks (GCN) and Graph Attention Networks (GAT). Unlike prior studies, this paper conducts a systematic head-to-head comparison of these approachs and introduces a newly collected balanced dataset of 2,000 malware and 2,000 benign sampels, for this paper Two datasets were employed, one of which was a large public dataset with 42,797 malware and 1,079 benign samples, and the other was a novel developed dataset consisting of 2,000 malware and 2,000 benign samples that were collected according to this research by means of sandboxed execution. To facilitate 10-fold cross-validation, API calls were pre-encoded into fixed length sequences of integers and call graphs directed to allow fair evaluation. The findings indicate that ensemble and tree-based models achieved competitive results (≈92% on the public dataset and ≈90% on the novel dataset), but the graph-based ones were more accurate with GCN coming to 98.76% and GAT at 98.33%. Because graph neural networks can capture relational dependence and contextual patterns in API call behavior, they generate a richer representation and stronger categorization than integer encodings also the best feature of graph-based models is that they learn not only features but also the connectivity of API calls, which gives much richer and more accurate representation than integer-only encodings. Unlike prior studies, this work conducts a systematic head-to-head comparison of these approaches and introduces a newly collected balanced dataset of 2,000 malware and 2,000 benign samples.

Downloads

Download data is not yet available.

Author Biographies

Meer Twana Qadir , Koya University

Department of Software Engineering

Saman Mirza Abdullah, Koya University

 Department of Software Engineering

Downloads

Published

2026-04-21