You need to agree to share your contact information to access this model

This repository is publicly accessible, but you have to accept the conditions to access its files and content.

Log in or Sign Up to review the conditions and access this model content.

HIPAA-BERT: PII/PHI Column Name Classifier

A fine-tuned BERT model for classifying database column names as PII (Personally Identifiable Information), PHI (Protected Health Information), or Other (O).

Model Details

Property Value
Developer KronosX AI Labs
Model Type BERT + LoRA (text classification)
Base Model bert-base-uncased
Language English
Fine-tuning Method LoRA (Low-Rank Adaptation)
Task Sequence Classification (3 classes)

Labels

Label Description Examples
O Other/Safe columns id, created_at, status
PII Personally Identifiable Info email, phone_number, address
PHI Protected Health Info (HIPAA) diagnosis_code, patient_name, ssn

Training Details

Hyperparameters

Parameter Value
Learning Rate 1e-3
Batch Size 64
Epochs 10
Weight Decay 0.01
Max Sequence Length 64
LoRA Rank (r) 16
LoRA Alpha 32
LoRA Dropout 0.1
Target Modules query, value

Training Data

Custom HIPAA-compliant dataset with ~50000+ labeled column names from healthcare databases.

Hardware

  • GPU: NVIDIA GPU (Kaggle)
  • Mixed Precision: FP16 enabled

Performance Metrics

Metric Score
Accuracy ~95%+
F1 (weighted) ~94%+
Precision ~93%+
Recall ~94%+

Usage

Installation

pip install transformers torch

Quick Start

from transformers import AutoTokenizer, AutoModelForSequenceClassification import torch

Load model

model_name = "KronosXAI/HIPAA-BERT-v0.1" tokenizer = AutoTokenizer.from_pretrained(model_name) model = AutoModelForSequenceClassification.from_pretrained(model_name)

Classify column names

columns = ["patient_name", "diagnosis_code", "created_at", "email", "status"] for col in columns: inputs = tokenizer(col, return_tensors="pt", truncation=True, max_length=64) with torch.no_grad(): outputs = model(**inputs) prediction = torch.argmax(outputs.logits, dim=-1).item()

label_map = {0: "O", 1: "PII", 2: "PHI"}
print(f"{col}: {label_map[prediction]}")

Expected Output

patient_name: PHI diagnosis_code: PHI created_at: O email: PII status: O

Intended Use

Primary Use Cases

  • Automatic PII/PHI detection in database schemas
  • Data privacy compliance audits
  • HIPAA compliance automation
  • Healthcare data anonymization pipelines

Out-of-Scope

  • This model classifies column names, not the actual data content
  • Not suitable for classifying free-text or unstructured data
  • Should be used as part of a larger compliance workflow, not as sole arbiter

Limitations & Bias

  • Trained primarily on English column naming conventions
  • May not generalize to non-standard or domain-specific naming patterns
  • Should be validated with domain experts before production use

Model Card Authors

Abishek - KronosX AI Labs

Citation

@misc{hipaa-bert-2024, author = {KronosX AI Labs}, title = {HIPAA-BERT: PII/PHI Column Name Classifier}, year = {2026}, url = {https://huggingface.co/KronosXAI/HIPAA-BERT-v0.1} }

Links

  • Organization: KronosX AI Labs
Downloads last month
16
Safetensors
Model size
0.1B params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for KronosXAI/HIPAA-BERT-v0.1

Adapter
(121)
this model