Data Quality Assessment and Preprocessing Techniques for Enhancing Machine Learning Model Performance.

Olakunle Ebenezer Aribisala

doi:10.37082/IJIRMPS.v12.i3.232799

Data Quality Assessment and Preprocessing Techniques for Enhancing Machine Learning Model Performance.

Authors: Olakunle Ebenezer Aribisala

DOI: https://doi.org/10.37082/IJIRMPS.v12.i3.232799

Short DOI: https://doi.org/g993mj

Country: Nigeria

Full-text Research PDF File: View | Download

Abstract: The model selection criteria are important for machine learning model performance as it depends strongly on the quality of the data employed for training and testing. Improperly managed data quality problems, such as missing data, noise, imbalance, redundancy, and variability, may lead to inaccurate prediction, decreased generalizability and biased learning. As the applications of machine learning keep on growing in diverse domains like Healthcare, Manufacturing, Climate Modeling, Finance, and Natural Resources Management etc., the need for systematic data quality evaluation and robust preprocessing strategies is rising. This article offers an in-depth analysis of the major dimensions of data quality, such as accuracy, completeness, consistency, validity, timeliness, and integrity and assesses the factors by which these dimensions affect the performances of models. Furthermore, the paper covers major data preprocessing techniques, including data cleaning, data normalization, data transformation, feature selection, dimensionality reduction, outlier detection, handling imbalanced data and data augmentation.
In addition, the article addresses the use of automated and semi-automated frameworks that are developed to support evaluation of data quality, and discusses recent advances that address challenges with data in specific domains. The review also highlights the need for pre-processing choice alignment and consideration of model characteristics, data structure and application. Experimental analyses and comparative evaluations are provided and shown to illustrate the how suitable preprocessing pipelines would be able to positively impact machine learning results through increased model robustness, effectiveness, and credibility.
The results indicate that optimized preprocessing strategies, based on systematic evaluation of the quality of data, form an important part of the optimization of the performance of machine learning models. The article ends by pointing out the existing gaps of the research, such as standardised data quality indicators, more sophisticated automation tools, and scalable preprocessing for big and complex datasets. Recommendations for future research paths and sound systems for actual implementation are offered to aid in the development of high-quality, reliable machine learning systems.

Keywords: Data quality; Machine learning; Preprocessing techniques; Feature engineering; Data cleaning; Dimensionality reduction; Imbalanced data handling; Data augmentation; Model performance optimization; Data governance

Paper Id: 232799

Published On: 2024-06-14

Published In: Volume 12, Issue 3, May-June 2024

All research papers published in this journal/on this website are openly accessible and licensed under Creative Commons Attribution-ShareAlike 4.0 International License; accordingly, any user can read, download, copy, distribute, print, search, or link to the full texts of the authors/researchers submitted and published articles, crawl them for indexing, pass them as data to any software, or use them for any other lawful purpose. The journal is fulfilling the DOAJ's definition of open access.

About IJIRMPS Indexing & Archiving Publication Ethics Peer Review & Plagiarism	Website/Journal Policies Usage Policy Content Policies Privacy Policy	Contact Us +91-9687-828-838 editor@ijirmps.org

International Journal of Innovative Research in Engineering & Multidisciplinary Physical Sciences
E-ISSN: 2349-7300 • Impact Factor - 9.907

A Widely Indexed Open Access Peer Reviewed Online Scholarly International Journal

Data Quality Assessment and Preprocessing Techniques for Enhancing Machine Learning Model Performance.

Share this

International Journal of Innovative Research in Engineering & Multidisciplinary Physical Sciences E-ISSN: 2349-7300 • Impact Factor - 9.907

A Widely Indexed Open Access Peer Reviewed Online Scholarly International Journal

Data Quality Assessment and Preprocessing Techniques for Enhancing Machine Learning Model Performance.

Share this

International Journal of Innovative Research in Engineering & Multidisciplinary Physical Sciences
E-ISSN: 2349-7300 • Impact Factor - 9.907