Journal of Practical Hepatology ›› 2026, Vol. 29 ›› Issue (2): 199-204.doi: 10.3969/j.issn.1672-5069.2026.02.010

• Non-alcoholic fatty liver diseases • Previous Articles     Next Articles

Establishment and validation of a risk prediction model for fatty liver disease in health checkup individuals

Zhang Shuizhu, Ding Menghan, Zhou Shuping   

  1. Department of Gastroenterology, Graduate School, Bengbu Medical University, Bengbu 233000, Anhui Province, China
  • Received:2025-08-18 Online:2026-03-10 Published:2026-03-13

Abstract: Objective The aim of this study was to set up and validate a precise yet low-cost early prediction model for fatty liver disease based on routine indicators available in health-checkup centers. Methods A retrospective cohort of 1212 individuals for physical examination was analyzed, and the fatty liver was diagnosed based on ultrasonography. Various indexes were calculated based on clinical materials. Nested cross-validation (10-fold outer loop for validation and 5-fold inner loop for tuning) was combined random-forest and XGBoost with LASSO Logistic regression was conducted for feature selection. Variable importance was interpreted with SHAP values. Internal validation was used 1000-bootstrap optimism-corrected AUC and external validation was employed by a 30 % random split. Results Of the 1212 individuals, fatty liver was found in 542 cases(44.7%);the final model retained four variables, e.g.,triglyceride-glucose-body mass index (TyG-BMI), body-fat percentage, diastolic blood pressure and monocyte to high-density lipoprotein cholesterol (MHR); AUC of nested-cross-validation was 0.874 (95 % CI: 0.855-0.893), AUC of final-model was 0.880 (95 % CI: 0.861-0.898), AUC of optimism-corrected was 0.878 (95 % CI: 0.860-0.897) and AUC of external was 0.866 (95 % CI: 0.830-0.902); calibration was excellent (slope ≈ 1; Hosmer-Lemeshow P=0.433) and robust under 30 % Gaussian noise (AUC=0.878); SHAP analysis identified TyG-BMI as the dominant contributor. Conclusion The four-variable model demonstrates high discrimination, excellent calibration, easy acquisition and strong generalizability, which might offer health-checkup centers a“precise, efficient and low-cost” screening tool for fatty liver disease.

Key words: Fatty liver, Machine learning prediction mode, Triglyceride-glucose-body mass index, Health check-up population