Credit Risk Prediction Model

Description

Machine learning model for predicting bank client defaults. This model uses an ensemble of CatBoost and LightGBM with advanced feature engineering to assess credit risk.

Business Context

Development of a high-performance credit risk assessment system for the banking sector. The primary goal is to minimize bank losses by automating the prediction of client default probability.

Model Performance

Metric Value
ROC-AUC 0.7523
Target KPI 0.75
Status βœ… Achieved

Tech Stack

  • Language: Python 3.10
  • Big Data Processing: Polars (Lazy Loading)
  • Machine Learning:
    • CatBoost (weight: 0.05)
    • LightGBM (weight: 0.95)
  • Infrastructure: GPU acceleration (NVIDIA RTX 3050)
  • Tools: Scikit-learn, Scipy, Pandas, Matplotlib, Seaborn

Dataset

  • Records: 3,000,000
  • Files: 12 Parquet files
  • Size: 4.5 GB
  • Class Imbalance: 1:49 (2% positive class)

Key Features

Over 170 engineered features including:

  • utilization_ratio β€” credit limit usage level
  • overdue_ratio β€” share of overdue debt
  • delays_per_loan β€” frequency of critical delays (90+ days)

Usage

Installation

pip install -r requirements.txt
import joblib
import polars as pl

# Load model
model = joblib.load("final_pipeline.pkl")

# Load data
df = pl.read_parquet("client_data.parquet")

# Make predictions
predictions = model.predict(df)
probabilities = model.predict_proba(df)

# Results
print(f"Default probability: {probabilities[:, 1]}")
from huggingface_hub import hf_hub_download
import joblib

# Download model
model_path = hf_hub_download(
    repo_id="maxdavinci/Credit_Risk_Prediction_Model_0.75",
    filename="final_pipeline.pkl"
)

# Load and use
model = joblib.load(model_path)

Engineering Solutions

Scalability: Polars for efficient Big Data processing
Class Imbalance: Stratified validation + scale_pos_weight (27.18)
Ensembling: Rank Averaging method for stability
Production Ready: Custom CreditEnsemble class compatible with sklearn.pipeline

Project Structure

Credit_Risk_Prediction_Model_0.75/ β”œβ”€β”€ credit_risk_modeling.ipynb # Jupyter notebook with code β”œβ”€β”€ final_pipeline.pkl # Trained model (90 MB) β”œβ”€β”€ requirements.txt # Dependencies └── README.md # This file

Links

GitHub Repository: https://github.com/maxdavinci2022/Credit_Risk_Prediction_Model_0.75
Author: @maxdavinci2022
Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support