Helix Nguyen · 1 min read

GAN+CFM-Powered Data Augmentation and GBT Ensemble Learning for Improving Diabetes Mellitus Prediction

Updated: March 10, 2026

When crucial data is missing, non-invasive diabetes diagnosis is difficult. This paper uses advanced tabular data augmentation to improve non-invasive diabetes diagnosis prediction. We investigate various oversampling methods to improve machine learning models’ ability to make clear predictions from limited data. Conditional Flow Matching (CFM) performed well, especially when combined with other augmentation methods. After extensive testing with oversampling and Gradient-Boosted Trees for synthetic data generation, we found that WGAN and CFM trained with CatBoost yielded the best results. This method outperformed single-method augmentation strategies with 98% specificity, 95.91% sensitivity, 96.26% accuracy, and 96.05% F1-Score on our non-invasive diabetes dataset. These findings show that multi-method augmentation can significantly improve non-invasive medical diagnostic machine learning models.

Read the full article below.

Read full article

Open on publisher site (e.g. Springer Nature)

737 views

GAN+CFM-Powered Data Augmentation and GBT Ensemble Learning for Improving Diabetes Mellitus Prediction

Recent Posts

Myanmar Crisis: Rohingya Residents Fleeing for Refuge to be Met with None…

Predicting Personality Traits from Instagram Captions Using NLP by Dauren Omarb…

CRISPR Cas-9 as a Potential Cure for Wilson's Disease by Satoshi Toya