Integrating Data Versioning and Management into CI/CD Pipelines for Machine Learning
Authors: Swamy Prasadarao Velaga
DOI: https://doi.org/https://doi.org/10.5281/zenodo.12805518
Short DOI: https://doi.org/gt442h
Country: India
Full-text Research PDF File: View | Download
Abstract: The rapid evolution and widespread adoption of machine learning (ML) applications have underscored the critical importance of data management practices that ensure reproducibility, reliability, and scalability in model development and deployment. Integrating data versioning and management into Continuous Integration and Continuous Deployment (CI/CD) pipelines for ML represents a pivotal strategy to address these challenges. This survey paper explores the significance of data versioning in CI/CD pipelines, examining key benefits such as enhanced reproducibility of experimental results, effective management of data drift, and compliance with regulatory standards. We delve into the challenges associated with integrating data versioning, including handling large dataset sizes, managing dynamic data sources, and ensuring compatibility across diverse data formats. Moreover, the paper discusses best practices and implementation strategies for adopting data versioning in CI/CD pipelines, emphasizing automation, scalability, and integration with Machine Learning Operations (MLOps). Finally, we outline promising future research directions in data versioning, including advancements in automation, security, and cross-domain collaboration, aimed at further enhancing the reliability and transparency of ML workflows. By addressing these aspects, this paper provides a comprehensive overview of current trends, challenges, and opportunities in leveraging data versioning to optimize CI/CD pipelines for machine learning applications
Keywords: Continuous Deployment, AI Systems, Machine Learning Models, Data Versioning
Paper Id: 230795
Published On: 2021-02-03
Published In: Volume 9, Issue 1, January-February 2021
Cite This: Integrating Data Versioning and Management into CI/CD Pipelines for Machine Learning - Swamy Prasadarao Velaga - IJIRMPS Volume 9, Issue 1, January-February 2021. DOI https://doi.org/10.5281/zenodo.12805518