Pattern Classifier

This model was trained to classify which patterns a subject model was trained on, based on neuron activation signatures.

Dataset

Training Dataset: maximuspowers/muat-pca-5
Input Mode: signature
Number of Patterns: 14

Patterns

The model predicts which of the following 14 patterns the subject model was trained on:

palindrome
sorted_ascending
sorted_descending
alternating
contains_abc
starts_with
ends_with
no_repeats
has_majority
increasing_pairs
decreasing_pairs
vowel_consonant
first_last_match
mountain_pattern

Model Architecture

Signature Encoder: [512, 256, 256, 128]
Activation: relu
Dropout: 0.2
Batch Normalization: True

Training Configuration

Optimizer: adam
Learning Rate: 0.001
Batch Size: 16
Loss Function: BCE with Logits (with pos_weight for training, unweighted for validation)

Test Set Performance

F1 Macro: 0.0968
F1 Micro: 0.1159
Hamming Accuracy: 0.7788
Exact Match Accuracy: 0.0152
BCE Loss: 0.5611

Per-Pattern Accuracy (Test Set)

When a model was trained on a pattern, what % of the time does the classifier detect it:

Pattern	Recall (Detection Rate)
palindrome	28.6%
sorted_ascending	15.5%
sorted_descending	32.6%
alternating	30.2%
contains_abc	41.6%
starts_with	8.6%
ends_with	19.2%
no_repeats	9.8%
has_majority	8.3%
increasing_pairs	17.4%
decreasing_pairs	3.8%
vowel_consonant	0.0%
first_last_match	22.5%
mountain_pattern	8.8%

Usage

import torch
from huggingface_hub import hf_hub_download

# Download the model
checkpoint_path = hf_hub_download(repo_id='maximuspowers/muat-separate-pca-10-classifier', filename='best_model.pt')
checkpoint = torch.load(checkpoint_path)

Downloads last month: 4

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Dataset used to train maximuspowers/muat-pca-5-classifier

Collection including maximuspowers/muat-pca-5-classifier

Meta-UAT

Collection

Weight space learning experiments (interpreting behavior through activation signatures) • 16 items • Updated 8 days ago