Colonel Blotto: Advanced RL + LLM System for NeurIPS 2025

Status Framework License

This repository contains trained models for the Colonel Blotto game, targeting the NeurIPS 2025 MindGames workshop. The system combines cutting-edge reinforcement learning with large language model fine-tuning.

🎯 Model Overview

This is an advanced system that achieves strong performance on Colonel Blotto through:

  • Graph Neural Networks for game state representation
  • FiLM layers for fast opponent adaptation
  • Meta-learning for strategy portfolios
  • LLM fine-tuning (SFT + DPO) for strategic reasoning
  • Distillation from LLMs back to efficient RL policies

Game Configuration

  • Fields: 3
  • Units per round: 20
  • Rounds per game: 5
  • Training episodes: N/A

πŸ“Š Performance Results

Against Scripted Opponents

Overall Win Rate: 0.00%

LLM Elo Ratings

Model Elo Rating

πŸ—οΈ Architecture

Policy Network

The core policy network uses a sophisticated architecture:

  1. Graph Encoder: Multi-layer Graph Attention Networks (GAT)

    • Heterogeneous nodes: field nodes, round nodes, summary node
    • Multi-head attention with 6 heads
    • 3 layers of message passing
  2. Opponent Encoder: MLP-based encoder for opponent modeling

    • Processes opponent history
    • Learns behavioral patterns
  3. FiLM Layers: Feature-wise Linear Modulation

    • Fast adaptation to opponent behavior
    • Conditioned on opponent encoding
  4. Portfolio Head: Multi-strategy selection

    • 6 specialist strategy heads
    • Soft attention-based mixing

Training Pipeline

The models were trained through a comprehensive 7-phase pipeline:

  1. Phase A: Environment setup and action space generation
  2. Phase B: PPO training against diverse scripted opponents
  3. Phase C: Preference dataset generation (LLM vs LLM rollouts)
  4. Phase D: Supervised Fine-Tuning (SFT) of base LLM
  5. Phase E: Direct Preference Optimization (DPO)
  6. Phase F: Knowledge distillation from LLM to policy
  7. Phase G: PPO refinement after distillation

πŸ“¦ Repository Contents

Policy Models

  • policy_models/policy_final.pt: PyTorch checkpoint
  • policy_models/policy_after_distill.pt: PyTorch checkpoint
  • policy_models/policy_after_ppo.pt: PyTorch checkpoint

Fine-tuned LLM Models

  • sft_model/: SFT model (HuggingFace Transformers compatible)
  • dpo_model/: DPO model (HuggingFace Transformers compatible)

Configuration & Results

  • master_config.json: Complete training configuration
  • battleground_eval.json: Comprehensive evaluation results
  • eval_scripted_after_ppo.json: Post-PPO evaluation

πŸš€ Usage

Loading Policy Model

import torch
from your_policy_module import PolicyNet

# Load configuration
with open("master_config.json", "r") as f:
    config = json.load(f)

# Initialize policy
policy = PolicyNet(
    Ff=config["F"],
    n_actions=231,  # For F=3, U=20
    hidden=config["hidden"],
    gnn_layers=config["gnn_layers"],
    gnn_heads=config["gnn_heads"],
    n_strat=config["n_strat"]
)

# Load trained weights
policy.load_state_dict(torch.load("policy_models/policy_final.pt"))
policy.eval()

Loading Fine-tuned LLM

from transformers import AutoTokenizer, AutoModelForCausalLM

# Load SFT or DPO model
tokenizer = AutoTokenizer.from_pretrained("./sft_model")
model = AutoModelForCausalLM.from_pretrained("./sft_model")

# Use for inference
inputs = tokenizer(prompt, return_tensors="pt")
outputs = model.generate(**inputs, max_new_tokens=32)

πŸŽ“ Research Context

This work targets the NeurIPS 2025 MindGames Workshop with a focus on:

  • Strategic game AI beyond traditional game-theoretic approaches
  • Hybrid systems combining neural RL and LLM reasoning
  • Fast adaptation to diverse opponents through meta-learning
  • Efficient deployment via distillation

Key Innovations

  1. Heterogeneous Graph Representation: Novel graph structure for Blotto game states
  2. Ground-truth Counterfactual Learning: Exploiting game determinism
  3. Multi-scale Representation: Field-level, round-level, and game-level embeddings
  4. LLM-to-RL Distillation: Transferring strategic reasoning to efficient policies

πŸ“ Citation

If you use this work, please cite:

@misc{colonelblotto2025neurips,
  title={{Advanced Reinforcement Learning System for Colonel Blotto Games}},
  author={{NeurIPS 2025 MindGames Submission}},
  year={2025},
  publisher={HuggingFace Hub},
  howpublished={{\url{{https://huggingface.co/{repo_id}}}}},
}

πŸ“„ License

MIT License - See LICENSE file for details

πŸ™ Acknowledgments

  • Built for NeurIPS 2025 MindGames Workshop
  • Uses PyTorch, HuggingFace Transformers, and PEFT
  • Training infrastructure: NVIDIA H200 GPU

Generated: {datetime.now().strftime("%Y-%m-%d %H:%M:%S")} Uploaded from: Notebook Environment

Downloads last month

-

Downloads are not tracked for this model. How to track
Video Preview
loading