Data Collection and Preprocessing in Web Usage Mining: Implementation and Analysis

Authors

  • Mohammed Ali Mohammed University of Information Technology and Communication
  • Rula A. Hamid University of Information Technology and Communications
  • Reem Razzaq AbdulHussein University of Information Technology and Communications

DOI:

https://doi.org/10.25195/ijci.v50i2.486

Keywords:

web usage mining, access log file, data collection, data preprocessing

Abstract

Data collection and data preprocessing are crucial stages in web usage mining, mainly because of the unstructured, diverse, and noisy nature of log data. During data collection, log file datasets are loaded and merged. Effective and comprehensive data preprocessing plays a vital role in ensuring the efficiency and scalability of algorithms used in the pattern discovery phase of web usage mining. This work aims to address these phases by introducing two innovative approaches. The first approach focuses on determining the device used for accessing the web, distinguishing between computers and mobile devices. The second approach aims to determine user sessions and complete paths by utilizing the referrer URL. The entire preprocessing pipeline has been implemented using the C# programming language, and the source code is available on GitHub at the following link: https://github.com/Mohammed91/Web-Usage-Mining.

Downloads

Download data is not yet available.

Author Biographies

Mohammed Ali Mohammed , University of Information Technology and Communication

College of Business Informatics

Rula A. Hamid, University of Information Technology and Communications

College of Business Informatics

Reem Razzaq AbdulHussein, University of Information Technology and Communications

College of Business Informatics

Downloads

Published

2024-11-16