Model info
Built an end-to-end ML pipeline for loan default prediction using structured financial data (give me some credit).
Implemented MICE imputation, feature engineering, and SMOTE for class balancing.
Trained an XGBoost classifier achieving ROC-AUC = 0.9899 and 96% accuracy (Precision = 0.96, Recall = 0.96) using the optimal threshold (0.469 based on Youden’s J statistic).
Training Hyperparameters
• booster: gbtree
• scale_pos_weight: 1.0844731956681208
• lambda: 0.007863744044452415
• alpha: 1.3708881631313705
• subsample: 0.7792604678854678
• colsample_bytree: 0.8536059725156923
• colsample_bynode: 0.7889610964707088
• max_depth: 7
• min_child_weight: 5
• gamma: 0.059826987594721145
• eta: 0.19998789278709772
ML Pipeline
Load Dataset -> MICE Imputation -> Feature Construction -> SMOTE (class balancing) -> Hyperparameter Tuning (optuna & StratifiedKFold) -> Algorithm (xgboost) -> Model Evelution (ROC-AUC) -> Save Model (joblib)