International Journal of Innovative Research in Engineering & Multidisciplinary Physical Sciences
E-ISSN: 2349-7300Impact Factor - 9.907

A Widely Indexed Open Access Peer Reviewed Online Scholarly International Journal

Call for Paper Volume 13 Issue 2 March-April 2025 Submit your research for publication

Impact of Data Quality on Machine Learning Models in Telecom and Media

Authors: Mahesh Mokale

DOI: https://doi.org/10.5281/zenodo.15155368

Short DOI: https://doi.org/g9ctbn

Country: USA

Full-text Research PDF File:   View   |   Download


Abstract: In the digital age, telecom and media companies increasingly rely on machine learning (ML) to drive innovation, enhance operational efficiency, and deliver personalized customer experiences. These industries generate and consume vast volumes of data daily—from customer interactions, usage logs, and social media behavior to network sensor outputs and streaming analytics. This data, when harnessed effectively, enables a broad range of ML applications such as churn prediction, fraud detection, targeted advertising, network optimization, and personalized content recommendation. However, the efficacy of ML models is fundamentally tied to the quality of the data feeding them. Poor data quality—manifesting as inaccuracies, incompleteness, inconsistencies, latency, irrelevance, and bias—can significantly degrade model performance. Models trained on flawed datasets are more likely to produce skewed or misleading outputs, leading to erroneous insights, misinformed strategies, wasted resources, and ultimately, dissatisfied customers. In high-stakes environments such as telecom and media, where decisions derived from ML insights affect millions of users in real time, the risks associated with poor data quality are amplified. This white paper explores the multifaceted impact of data quality on ML models within telecom and media, beginning with a detailed analysis of the types of data typically encountered in these sectors and the specific challenges they present. It highlights how subpar data quality can introduce systemic bias, reduce model generalizability, increase error rates, and lower overall system reliability and trustworthiness. Furthermore, the paper outlines common data quality issues unique to these industries, including duplicate records from multiple data sources, inconsistent data formats, imbalanced usage data, and outdated streaming or log data. In addition to identifying these challenges, the paper presents real-world case studies demonstrating the quantifiable benefits of data cleaning and preprocessing. It details how organizations improved ML performance metrics and customer satisfaction by addressing core data quality issues. Moreover, it recommends actionable strategies and modern toolsets to ensure robust data pipelines that support scalable and trustworthy ML models. Ultimately, this paper underscores that data quality is not merely a technical hygiene practice but a strategic imperative for companies aiming to compete effectively in the evolving telecom and media landscapes.

Keywords:


Paper Id: 232349

Published On: 2024-06-06

Published In: Volume 12, Issue 3, May-June 2024

Share this