BALab seminars — Predictive analytics of software upgrades based on usage telemetry

Abstract

Software upgrades are critical for maintaining security, performance and feature adoption. However, many users delay or avoid upgrading despite the availability of newer versions. Understanding the behavioural factors that influence upgrade decisions remains challenging due to the scale and complexity of telemetry data. This thesis analyses large-scale software telemetry data collected from engineering softwares to model user behaviour preceding version upgrades. Event-level logs containing timestamps, user identifiers, software versions and commands are transformed into a structured dataset through extensive preprocessing and feature engineering. Temporal aggregation windows are applied to capture usage intensity and engagement patterns.

Multiple machine learning models, including Logistic Regression, Random Forest, XGBoost and Recurrent Neural Networks, are trained and evaluated to predict both upgrade likelihood and time-to-upgrade. For upgrade classification, Logistic Regression achieved strong predictive performance (AUC-ROC = 0.86, PR AUC = 0.93), effectively identifying users likely to upgrade soon. For time-to-upgrade prediction, XGBoost provided the most balanced regression performance (RMSE = 162.79, R² = 0.783), explaining a substantial proportion of variance in upgrade timing.

Unsupervised clustering further revealed distinct behavioural segments. K-means identified two primary groups: Power Users (~30%), characterized by high activity and feature exploration and Average Users (~70%), exhibiting more moderate and routine engagement.

Overall, the results demonstrate that upgrade adoption is strongly driven by user engagement intensity, recency of activity, and exposure to new releases. These findings enable targeted strategies such as prioritizing highly engaged users for early-release programs, providing guided support to moderately active users and proactively reaching out to predicted late adopters to reduce upgrade delays.