Credit Risk Prediction Model
Description
Machine learning model for predicting bank client defaults. This model uses an ensemble of CatBoost and LightGBM with advanced feature engineering to assess credit risk.
Business Context
Development of a high-performance credit risk assessment system for the banking sector. The primary goal is to minimize bank losses by automating the prediction of client default probability.
Model Performance
| Metric | Value |
|---|---|
| ROC-AUC | 0.7523 |
| Target KPI | 0.75 |
| Status | β Achieved |
Tech Stack
- Language: Python 3.10
- Big Data Processing: Polars (Lazy Loading)
- Machine Learning:
- CatBoost (weight: 0.05)
- LightGBM (weight: 0.95)
- Infrastructure: GPU acceleration (NVIDIA RTX 3050)
- Tools: Scikit-learn, Scipy, Pandas, Matplotlib, Seaborn
Dataset
- Records: 3,000,000
- Files: 12 Parquet files
- Size: 4.5 GB
- Class Imbalance: 1:49 (2% positive class)
Key Features
Over 170 engineered features including:
utilization_ratioβ credit limit usage leveloverdue_ratioβ share of overdue debtdelays_per_loanβ frequency of critical delays (90+ days)
Usage
Installation
pip install -r requirements.txt
import joblib
import polars as pl
# Load model
model = joblib.load("final_pipeline.pkl")
# Load data
df = pl.read_parquet("client_data.parquet")
# Make predictions
predictions = model.predict(df)
probabilities = model.predict_proba(df)
# Results
print(f"Default probability: {probabilities[:, 1]}")
from huggingface_hub import hf_hub_download
import joblib
# Download model
model_path = hf_hub_download(
repo_id="maxdavinci/Credit_Risk_Prediction_Model_0.75",
filename="final_pipeline.pkl"
)
# Load and use
model = joblib.load(model_path)
Engineering Solutions
Scalability: Polars for efficient Big Data processing
Class Imbalance: Stratified validation + scale_pos_weight (27.18)
Ensembling: Rank Averaging method for stability
Production Ready: Custom CreditEnsemble class compatible with sklearn.pipeline
Project Structure
Credit_Risk_Prediction_Model_0.75/ βββ credit_risk_modeling.ipynb # Jupyter notebook with code βββ final_pipeline.pkl # Trained model (90 MB) βββ requirements.txt # Dependencies βββ README.md # This file
Links
GitHub Repository: https://github.com/maxdavinci2022/Credit_Risk_Prediction_Model_0.75
Author: @maxdavinci2022