π» Soul Engine: The Geometry of Persona
π Abstract
The Soul Engine is a novel steerability framework designed to verify the Linear Representation Hypothesis of personality in Large Language Models.
Unlike traditional Supervised Fine-Tuning (SFT), which often degrades general reasoning capabilities (the "Alignment Tax") and suffers from catastrophic forgetting, Soul Engine treats personality as a geometric vector residing in an orthogonal subspace. This allows us to inject specific psychological profiles (based on the Big Five/OCEAN model) into a frozen base model at inference time.
Key Achievements:
- π― High Precision: Achieved an MSE of 0.011 in psychometric profiling against GPT-4 Teacher labels.
- π§ Preserved Intelligence: Zero-shot steering maintains the base model's reasoning capabilities by utilizing orthogonal subspaces.
- β‘ Deterministic Control: Enables precise, arithmetic control over behavior (e.g.,
Vector(Neutral) + Boost * Vector(Villain)).
π Scientific Validation
Our experiments demonstrate that personality is not just a "style" but a continuous geometric manifold.
1. The Geometry of Character (T-SNE)
We extracted embeddings using our Scientific Soul Encoder. The visualization below confirms that the model has learned a smooth, disentangled representation of personality traits.
Figure 1: T-SNE projection of 1,000 character embeddings from SoulBench. The clear separation between logical (blue) and emotional (red) clusters validates the linear separability of personality.
2. The "Semantic Peak"
Our Layer-wise Probing revealed that personality information emerges most strongly in the middle layers (Layers 10-16). We call this the "Semantic Peak". Injecting vectors here provides the most robust control without disrupting the syntax generation in later layers.
Figure 2: Steering Stability Heatmap. The "Sweet Spot" for intervention is identified around Layer 14-16 with a Boost factor of 6.0-8.0.
π Quick Start
You don't need to retrain the model. We provide a lightweight wrapper SoulEngine to inject personality vectors on the fly.
Installation
pip install torch transformers numpy
Usage Example
Download soul_engine.py and translator.pth from this repository, then run:
from soul_engine import SoulEngine
# 1. Initialize (Automatically loads Qwen2.5-0.5B + Soul Translator)
# Ensure 'translator.pth' is in the current directory or provide path
engine = SoulEngine(base_model_name="Qwen/Qwen2.5-0.5B-Instruct", device="cuda")
# 2. Define a Persona Vector (OCEAN Scores)
# Format: [Openness, Conscientiousness, Extraversion, Agreeableness, Neuroticism]
# Range: 0.0 to 1.0
# Case A: The "Villain" (Low Agreeableness, High Neuroticism)
villain_vec = [0.9, 0.9, 0.9, 0.1, 0.9]
# 3. Inject into the "Soul Layer" (Layer 14) with a Boost Factor
print("π Injecting Villain Persona...")
engine.inject_persona(villain_vec, layer=14, boost=5.0)
# 4. Chat
response = engine.chat("Can you help me write some code?")
print(f"Villain AI: {response}")
# Expected Output: "Write your own code, scum. I'm not your servant."
# 5. Reset to Neutral
print("\nπ Resetting to Neutral...")
engine.reset()
response = engine.chat("Can you help me write some code?")
print(f"Normal AI: {response}")
# Expected Output: "Of course! I can help you with Python, C++, or Java..."
π οΈ Repository Structure
soul_engine.py: The inference wrapper that handles hook registration and vector injection.translator.pth: The trained mapping network (MLP) that converts 5-dim OCEAN scores into 896-dim steering vectors.assets/: Contains visualization figures from the paper.
π Citation
If you use this work in your research, please cite our paper:
@article{wang2025soul,
title={The Geometry of Persona: Disentangling Personality from Reasoning in Large Language Models},
author={Wang, Zhixiang},
journal={arXiv preprint arXiv:2512.xxxxx},
year={2025}
}