Credit Risk Prediction Model

Description

Machine learning model for predicting bank client defaults. This model uses an ensemble of CatBoost and LightGBM with advanced feature engineering to assess credit risk.

Business Context

Development of a high-performance credit risk assessment system for the banking sector. The primary goal is to minimize bank losses by automating the prediction of client default probability.

Model Performance

Metric	Value
ROC-AUC	0.7523
Target KPI	0.75
Status	✅ Achieved

Tech Stack

Language: Python 3.10
Big Data Processing: Polars (Lazy Loading)
Machine Learning:
- CatBoost (weight: 0.05)
- LightGBM (weight: 0.95)
Infrastructure: GPU acceleration (NVIDIA RTX 3050)
Tools: Scikit-learn, Scipy, Pandas, Matplotlib, Seaborn

Dataset

Records: 3,000,000
Files: 12 Parquet files
Size: 4.5 GB
Class Imbalance: 1:49 (2% positive class)

Key Features

Over 170 engineered features including:

utilization_ratio — credit limit usage level
overdue_ratio — share of overdue debt
delays_per_loan — frequency of critical delays (90+ days)

Usage

Installation

pip install -r requirements.txt

import joblib
import polars as pl

# Load model
model = joblib.load("final_pipeline.pkl")

# Load data
df = pl.read_parquet("client_data.parquet")

# Make predictions
predictions = model.predict(df)
probabilities = model.predict_proba(df)

# Results
print(f"Default probability: {probabilities[:, 1]}")

from huggingface_hub import hf_hub_download
import joblib

# Download model
model_path = hf_hub_download(
    repo_id="maxdavinci/Credit_Risk_Prediction_Model_0.75",
    filename="final_pipeline.pkl"
)

# Load and use
model = joblib.load(model_path)

Engineering Solutions

Scalability: Polars for efficient Big Data processing
Class Imbalance: Stratified validation + scale_pos_weight (27.18)
Ensembling: Rank Averaging method for stability
Production Ready: Custom CreditEnsemble class compatible with sklearn.pipeline

Project Structure

Credit_Risk_Prediction_Model_0.75/ ├── credit_risk_modeling.ipynb # Jupyter notebook with code ├── final_pipeline.pkl # Trained model (90 MB) ├── requirements.txt # Dependencies └── README.md # This file

Links

GitHub Repository: https://github.com/maxdavinci2022/Credit_Risk_Prediction_Model_0.75
Author: @maxdavinci2022

Downloads last month: -; Downloads are not tracked for this model. How to track