Data Collection and Preprocessing in Web Usage Mining: Implementation and Analysis

Mohammed Ali Mohammed; Rula A. Hamid; Reem Razzaq AbdulHussein

doi:10.25195/ijci.v50i2.486

Data Collection and Preprocessing in Web Usage Mining: Implementation and Analysis

Authors

Mohammed Ali Mohammed University of Information Technology and Communication
Rula A. Hamid University of Information Technology and Communications
Reem Razzaq AbdulHussein University of Information Technology and Communications

DOI:

https://doi.org/10.25195/ijci.v50i2.486

Keywords:

web usage mining, access log file, data collection, data preprocessing

Abstract

Data collection and data preprocessing are crucial stages in web usage mining, mainly because of the unstructured, diverse, and noisy nature of log data. During data collection, log file datasets are loaded and merged. Effective and comprehensive data preprocessing plays a vital role in ensuring the efficiency and scalability of algorithms used in the pattern discovery phase of web usage mining. This work aims to address these phases by introducing two innovative approaches. The first approach focuses on determining the device used for accessing the web, distinguishing between computers and mobile devices. The second approach aims to determine user sessions and complete paths by utilizing the referrer URL. The entire preprocessing pipeline has been implemented using the C# programming language, and the source code is available on GitHub at the following link: https://github.com/Mohammed91/Web-Usage-Mining.

Downloads

Download data is not yet available.

Author Biographies

Mohammed Ali Mohammed , University of Information Technology and Communication

College of Business Informatics

Rula A. Hamid, University of Information Technology and Communications

College of Business Informatics

Reem Razzaq AbdulHussein, University of Information Technology and Communications

College of Business Informatics

Downloads

Published

2024-11-16

How to Cite

Ali Mohammed , M., A. Hamid, R., & Razzaq AbdulHussein, R. (2024). Data Collection and Preprocessing in Web Usage Mining: Implementation and Analysis. Iraqi Journal for Computers and Informatics, 50(2), 54–74. https://doi.org/10.25195/ijci.v50i2.486

Download Citation

Issue

Vol. 50 No. 2 (2024): Volume 50 Issue 2 Year 2024

Section

Articles

License

This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.

IJCI applies the Creative Commons Attribution (CC BY) license to articles. The author of the submitted paper for publication by IJCI has the CC BY license. Under this Open Access license, the author gives an agreement to any author to reuse the article in whole or part for any purpose, even for commercial purposes. Anyone may copy, distribute, or reuse the content as long as the author and source are properly cited. This facility helps in re-use and ensures that journal content is available for the needs of research.
If the manuscript contains photos, images, figures, tables, audio files, videos, etc., that the author or the co-authors do not own, IJCI will require the author to provide the journal with proof that the owner of that content has given the author written permission to use it, and the owner has approved that the CC BY license being applied to content. IJCI provides a form that the author can use to ask for permission from the owner. If the author does not have owner permission, IJCI will ask the author to remove that content and/or replace it with other content that the author owns or has such permission to use.
Many authors assume that if they previously published a paper through another publisher, they have the right to reuse that content in their PLOS paper, but that is not necessarily the case – it depends on the license that covers the other paper. The author must ascertain the rights he/she has of a specific license (a license that enables the author to use the content). The author must obtain written permission from the publisher to use the content in the IJCI paper. The author should not include any content in her/his IJCI paper without having the right to use it, and always give proper attribution.
The accompanying submitted data should be stated with licensing policies, the policies should not be more restrictive than CC BY.
IJCI has the right to remove photos, captures, images, figures, tables, illustrations, audio, and video files, from a paper before or after publication, if these contents were included in the author's paper without permission from the owner of the content.

Data Collection and Preprocessing in Web Usage Mining: Implementation and Analysis

Authors

DOI:

Keywords:

Abstract

Downloads

Author Biographies

Mohammed Ali Mohammed , University of Information Technology and Communication

Rula A. Hamid, University of Information Technology and Communications

Reem Razzaq AbdulHussein, University of Information Technology and Communications

Downloads

Published

How to Cite

Issue

Section

License

Issn Journal

Current Issue

Information