Rochi, Musa and Schubert, Marcel (2025) Lactate Threshold Prediction. Other thesis, OST Ostschweizer Fachhochschule.
Full text not available from this repository.Abstract
Blood lactate responses are collected during graded exercise testing and are used to assess physical fitness. They help estimate lactate thresholds, which indicate endurance capacity, and are used to define training zones. These training zones are important for avoiding under- or overtraining. At our project partner, Davos Sports and Health (DSH), these thresholds are currently adjusted manually by experts, which limits standardization and requires additional recalculation effort.
To address these limitations, this project aims to develop an AI model that predicts the aerobic threshold based on data characteristics and to implement a demonstrator that enables practical application.
We analyze graded exercise test data from young athletes to estimate aerobic threshold using physiological and machine learning approaches. The dataset is composed of 510 tests from 328 participants (median age 24 years, 62 percent male), which were collected using treadmill and cycle ergometer protocols. The dataset includes approximately 430 anthropometric, physiological, and performance related variables. After manual data cleaning and application of a standardized preprocessing pipeline with automated validation, 500 tests are retained for analysis.
Exploratory data analysis is conducted to assess data quality, distributions, and feature correlations. Physiologically informed feature engineering reduces the feature space to 32 dimensions, including parameters derived from exponential lactate intensity curves, physiologically meaningful points along these curves, and two independent approximations of the aerobic threshold. To mitigate modality specific effects, exercise intensity is normalized to individual maximum intensity.
We train and evaluate multiple statistical and machine learning models such as linear regression, ElasticNet, principal component regression, and partial least squares regression. More complex models include k nearest neighbors, support vector regression, probabilistic Gaussian process regression, multilayer perceptron, and XGBoost. The models are used to predict heart rate, lactate, and intensity based aerobic threshold definitions. The models are trained using a stratified 70 / 30 train test split with tenfold cross validation. Performance is assessed using mean squared error, root mean squared error, and coefficient of determination, and feature contributions are analyzed using SHAP values.
We find that simple models achieve strong predictive performance, while more complex approaches provide no substantial improvement under our data conditions. Heart-rate-based targets are considerably easier to predict than lactate- and intensity-based targets. The best performing ElasticNet model achieves an RMSE of 5.102 bpm and an R^2 of 0.869 and predicts across both modalities. We implemented an application deployable via Docker, supporting automated predictions, interactive visualizations, clinician-driven adjustments, PDF report generation, and an expert analysis mode.
| Item Type: | Thesis (Other) |
|---|---|
| Subjects: | Area of Application > Business oriented Area of Application > Healthcare, Medical Sector Technologies > Programming Languages > Python Technologies > Virtualization > Docker |
| Divisions: | Bachelor of Science FHO in Informatik > Student Research Project |
| Depositing User: | OST Deposit User |
| Date Deposited: | 26 Feb 2026 09:04 |
| Last Modified: | 26 Feb 2026 09:04 |
| URI: | https://eprints.ost.ch/id/eprint/1370 |
