Colonel Blotto: Advanced RL + LLM System for NeurIPS 2025

This repository contains trained models for the Colonel Blotto game, targeting the NeurIPS 2025 MindGames workshop. The system combines cutting-edge reinforcement learning with large language model fine-tuning.

🎯 Model Overview

This is an advanced system that achieves strong performance on Colonel Blotto through:

Graph Neural Networks for game state representation
FiLM layers for fast opponent adaptation
Meta-learning for strategy portfolios
LLM fine-tuning (SFT + DPO) for strategic reasoning
Distillation from LLMs back to efficient RL policies

Game Configuration

Fields: 3
Units per round: 20
Rounds per game: 5
Training episodes: N/A

📊 Performance Results

Against Scripted Opponents

Overall Win Rate: 0.00%

LLM Elo Ratings

Model	Elo Rating

🏗️ Architecture

Policy Network

The core policy network uses a sophisticated architecture:

Graph Encoder: Multi-layer Graph Attention Networks (GAT)
- Heterogeneous nodes: field nodes, round nodes, summary node
- Multi-head attention with 6 heads
- 3 layers of message passing
Opponent Encoder: MLP-based encoder for opponent modeling
- Processes opponent history
- Learns behavioral patterns
FiLM Layers: Feature-wise Linear Modulation
- Fast adaptation to opponent behavior
- Conditioned on opponent encoding
Portfolio Head: Multi-strategy selection
- 6 specialist strategy heads
- Soft attention-based mixing

Training Pipeline

The models were trained through a comprehensive 7-phase pipeline:

Phase A: Environment setup and action space generation
Phase B: PPO training against diverse scripted opponents
Phase C: Preference dataset generation (LLM vs LLM rollouts)
Phase D: Supervised Fine-Tuning (SFT) of base LLM
Phase E: Direct Preference Optimization (DPO)
Phase F: Knowledge distillation from LLM to policy
Phase G: PPO refinement after distillation

📦 Repository Contents

Policy Models

policy_models/policy_final.pt: PyTorch checkpoint
policy_models/policy_after_distill.pt: PyTorch checkpoint
policy_models/policy_after_ppo.pt: PyTorch checkpoint

Fine-tuned LLM Models

sft_model/: SFT model (HuggingFace Transformers compatible)
dpo_model/: DPO model (HuggingFace Transformers compatible)

Configuration & Results

master_config.json: Complete training configuration
battleground_eval.json: Comprehensive evaluation results
eval_scripted_after_ppo.json: Post-PPO evaluation

🚀 Usage

Loading Policy Model

import torch
from your_policy_module import PolicyNet

# Load configuration
with open("master_config.json", "r") as f:
    config = json.load(f)

# Initialize policy
policy = PolicyNet(
    Ff=config["F"],
    n_actions=231,  # For F=3, U=20
    hidden=config["hidden"],
    gnn_layers=config["gnn_layers"],
    gnn_heads=config["gnn_heads"],
    n_strat=config["n_strat"]
)

# Load trained weights
policy.load_state_dict(torch.load("policy_models/policy_final.pt"))
policy.eval()

Loading Fine-tuned LLM

from transformers import AutoTokenizer, AutoModelForCausalLM

# Load SFT or DPO model
tokenizer = AutoTokenizer.from_pretrained("./sft_model")
model = AutoModelForCausalLM.from_pretrained("./sft_model")

# Use for inference
inputs = tokenizer(prompt, return_tensors="pt")
outputs = model.generate(**inputs, max_new_tokens=32)

🎓 Research Context

This work targets the NeurIPS 2025 MindGames Workshop with a focus on:

Strategic game AI beyond traditional game-theoretic approaches
Hybrid systems combining neural RL and LLM reasoning
Fast adaptation to diverse opponents through meta-learning
Efficient deployment via distillation

Key Innovations

Heterogeneous Graph Representation: Novel graph structure for Blotto game states
Ground-truth Counterfactual Learning: Exploiting game determinism
Multi-scale Representation: Field-level, round-level, and game-level embeddings
LLM-to-RL Distillation: Transferring strategic reasoning to efficient policies

📝 Citation

If you use this work, please cite:

@misc{colonelblotto2025neurips,
  title={{Advanced Reinforcement Learning System for Colonel Blotto Games}},
  author={{NeurIPS 2025 MindGames Submission}},
  year={2025},
  publisher={HuggingFace Hub},
  howpublished={{\url{{https://huggingface.co/{repo_id}}}}},
}

📄 License

MIT License - See LICENSE file for details

🙏 Acknowledgments

Built for NeurIPS 2025 MindGames Workshop
Uses PyTorch, HuggingFace Transformers, and PEFT
Training infrastructure: NVIDIA H200 GPU

Generated: {datetime.now().strftime("%Y-%m-%d %H:%M:%S")} Uploaded from: Notebook Environment

Downloads last month: -; Downloads are not tracked for this model. How to track

Video Preview

Reinforcement Learning