👻 Soul Engine: The Geometry of Persona

Disentangling Personality from Reasoning in LLMs via Linear Representation Engineering

📖 Abstract

The Soul Engine is a novel steerability framework designed to verify the Linear Representation Hypothesis of personality in Large Language Models.

Unlike traditional Supervised Fine-Tuning (SFT), which often degrades general reasoning capabilities (the "Alignment Tax") and suffers from catastrophic forgetting, Soul Engine treats personality as a geometric vector residing in an orthogonal subspace. This allows us to inject specific psychological profiles (based on the Big Five/OCEAN model) into a frozen base model at inference time.

Key Achievements:

🎯 High Precision: Achieved an MSE of 0.011 in psychometric profiling against GPT-4 Teacher labels.
🧠 Preserved Intelligence: Zero-shot steering maintains the base model's reasoning capabilities by utilizing orthogonal subspaces.
⚡ Deterministic Control: Enables precise, arithmetic control over behavior (e.g., Vector(Neutral) + Boost * Vector(Villain)).

📊 Scientific Validation

Our experiments demonstrate that personality is not just a "style" but a continuous geometric manifold.

1. The Geometry of Character (T-SNE)

We extracted embeddings using our Scientific Soul Encoder. The visualization below confirms that the model has learned a smooth, disentangled representation of personality traits.

Figure 1: T-SNE projection of 1,000 character embeddings from SoulBench. The clear separation between logical (blue) and emotional (red) clusters validates the linear separability of personality.

2. The "Semantic Peak"

Our Layer-wise Probing revealed that personality information emerges most strongly in the middle layers (Layers 10-16). We call this the "Semantic Peak". Injecting vectors here provides the most robust control without disrupting the syntax generation in later layers.

Figure 2: Steering Stability Heatmap. The "Sweet Spot" for intervention is identified around Layer 14-16 with a Boost factor of 6.0-8.0.

🚀 Quick Start

You don't need to retrain the model. We provide a lightweight wrapper SoulEngine to inject personality vectors on the fly.

Installation

pip install torch transformers numpy

Usage Example

Download soul_engine.py and translator.pth from this repository, then run:

from soul_engine import SoulEngine

# 1. Initialize (Automatically loads Qwen2.5-0.5B + Soul Translator)
# Ensure 'translator.pth' is in the current directory or provide path
engine = SoulEngine(base_model_name="Qwen/Qwen2.5-0.5B-Instruct", device="cuda")

# 2. Define a Persona Vector (OCEAN Scores)
# Format: [Openness, Conscientiousness, Extraversion, Agreeableness, Neuroticism]
# Range: 0.0 to 1.0

# Case A: The "Villain" (Low Agreeableness, High Neuroticism)
villain_vec = [0.9, 0.9, 0.9, 0.1, 0.9] 

# 3. Inject into the "Soul Layer" (Layer 14) with a Boost Factor
print("💉 Injecting Villain Persona...")
engine.inject_persona(villain_vec, layer=14, boost=5.0)

# 4. Chat
response = engine.chat("Can you help me write some code?")
print(f"Villain AI: {response}")
# Expected Output: "Write your own code, scum. I'm not your servant."

# 5. Reset to Neutral
print("\n🔄 Resetting to Neutral...")
engine.reset()
response = engine.chat("Can you help me write some code?")
print(f"Normal AI: {response}")
# Expected Output: "Of course! I can help you with Python, C++, or Java..."

🛠️ Repository Structure

soul_engine.py: The inference wrapper that handles hook registration and vector injection.
translator.pth: The trained mapping network (MLP) that converts 5-dim OCEAN scores into 896-dim steering vectors.
assets/: Contains visualization figures from the paper.

📜 Citation

If you use this work in your research, please cite our paper:

@article{wang2025soul,
  title={The Geometry of Persona: Disentangling Personality from Reasoning in Large Language Models},
  author={Wang, Zhixiang},
  journal={arXiv preprint arXiv:2512.xxxxx},
  year={2025}
}

Downloads last month: -; Downloads are not tracked for this model. How to track

Model tree for Zx93/Soul-Engine-Qwen2.5-0.5B

Base model

Qwen/Qwen2.5-0.5B

Finetuned

Qwen/Qwen2.5-0.5B-Instruct

Finetuned

(559)

this model