When crucial data is missing, non-invasive diabetes diagnosis is difficult. This paper uses advanced tabular data augmentation to improve non-invasive diabetes diagnosis prediction. We investigate various oversampling methods to improve machine learning models’ ability to make clear predictions from limited data. Conditional Flow Matching (CFM) performed well, especially when combined with other augmentation methods. After extensive testing with oversampling and Gradient-Boosted Trees for synthetic data generation, we found that WGAN and CFM trained with CatBoost yielded the best results. This method outperformed single-method augmentation strategies with 98% specificity, 95.91% sensitivity, 96.26% accuracy, and 96.05% F1-Score on our non-invasive diabetes dataset. These findings show that multi-method augmentation can significantly improve non-invasive medical diagnostic machine learning models.
GAN+CFM-Powered Data Augmentation and GBT Ensemble Learning for Improving Diabetes Mellitus Prediction
Updated: March 10, 2026
Read the full article below.
Read full article
629 views