| | --- |
| | license: mit |
| | tags: |
| | - vector-database |
| | - semantic-search |
| | - embeddings |
| | - llm |
| | - memory |
| | - hnsw |
| | - rust |
| | - python |
| | library_name: arms-hat |
| | pipeline_tag: feature-extraction |
| | --- |
| | |
| | # HAT: Hierarchical Attention Tree |
| |
|
| | **A novel index structure for AI memory systems that achieves 100% recall at 70x faster build times than HNSW.** |
| |
|
| | **Also: A new database paradigm for any domain with known hierarchy + semantic similarity.** |
| |
|
| | [](https://pypi.org/project/arms-hat/) |
| | [](https://crates.io/crates/arms-hat) |
| | [](LICENSE) |
| | [](https://www.rust-lang.org/) |
| | [](https://www.python.org/) |
| |
|
| | --- |
| |
|
| | ## Architecture |
| |
|
| | <p align="center"> |
| | <img src="images/fig01_architecture.jpg" alt="HAT Architecture" width="800"/> |
| | </p> |
| |
|
| | HAT exploits the **known hierarchy** in AI conversations: sessions contain documents, documents contain chunks. This structural prior enables O(log n) queries with 100% recall. |
| |
|
| | --- |
| |
|
| | ## Key Results |
| |
|
| | <p align="center"> |
| | <img src="images/fig09_summary_results.jpg" alt="Summary Results" width="800"/> |
| | </p> |
| |
|
| | | Metric | HAT | HNSW | Improvement | |
| | |--------|-----|------|-------------| |
| | | **Recall@10** | **100%** | 70% | +30% | |
| | | **Build Time** | 30ms | 2.1s | **70x faster** | |
| | | **Query Latency** | 3.1ms | - | Production-ready | |
| |
|
| | *Benchmarked on hierarchically-structured AI conversation data* |
| |
|
| | --- |
| |
|
| | ## Recall Comparison |
| |
|
| | <p align="center"> |
| | <img src="images/fig02_recall_comparison.jpg" alt="HAT vs HNSW Recall" width="700"/> |
| | </p> |
| |
|
| | HAT achieves **100% recall** where HNSW achieves only ~70% on hierarchically-structured data. |
| |
|
| | --- |
| |
|
| | ## Build Time |
| |
|
| | <p align="center"> |
| | <img src="images/fig03_build_time.jpg" alt="Build Time Comparison" width="700"/> |
| | </p> |
| |
|
| | HAT builds indexes **70x faster** than HNSW - critical for real-time applications. |
| |
|
| | --- |
| |
|
| | ## The Problem |
| |
|
| | Large language models have finite context windows. A 10K context model can only "see" the most recent 10K tokens, losing access to earlier conversation history. |
| |
|
| | **Current solutions fall short:** |
| | - Longer context models: Expensive to train and run |
| | - Summarization: Lossy compression that discards detail |
| | - RAG retrieval: Re-embeds and recomputes attention every query |
| |
|
| | ## The HAT Solution |
| |
|
| | <p align="center"> |
| | <img src="images/fig06_hat_vs_rag.jpg" alt="HAT vs RAG" width="800"/> |
| | </p> |
| |
|
| | HAT exploits **known structure** in AI workloads. Unlike general vector databases that treat data as unstructured point clouds, AI conversations have inherent hierarchy: |
| |
|
| | ``` |
| | Session (conversation boundary) |
| | βββ Document (topic or turn) |
| | βββ Chunk (individual message) |
| | ``` |
| |
|
| | ### The Hippocampus Analogy |
| |
|
| | <p align="center"> |
| | <img src="images/fig05_hippocampus.jpg" alt="Hippocampus Analogy" width="800"/> |
| | </p> |
| |
|
| | HAT mirrors human memory architecture - functioning as an **artificial hippocampus** for AI systems. |
| |
|
| | --- |
| |
|
| | ## How It Works |
| |
|
| | ### Beam Search Query |
| |
|
| | <p align="center"> |
| | <img src="images/fig10_beam_search.jpg" alt="Beam Search" width="800"/> |
| | </p> |
| |
|
| | HAT uses beam search through the hierarchy: |
| |
|
| | ``` |
| | 1. Start at root |
| | 2. At each level, score children by cosine similarity to query |
| | 3. Keep top-b candidates (beam width) |
| | 4. Return top-k from leaf level |
| | ``` |
| |
|
| | **Complexity:** O(b Β· d Β· c) = O(log n) when balanced |
| |
|
| | ### Consolidation Phases |
| |
|
| | <p align="center"> |
| | <img src="images/fig08_consolidation.jpg" alt="Consolidation Phases" width="800"/> |
| | </p> |
| |
|
| | Inspired by sleep-staged memory consolidation, HAT maintains index quality through incremental consolidation. |
| |
|
| | --- |
| |
|
| | ## Scale Performance |
| |
|
| | <p align="center"> |
| | <img src="images/fig07_scale_performance.jpg" alt="Scale Performance" width="700"/> |
| | </p> |
| |
|
| | HAT maintains **100% recall** across all tested scales while HNSW degrades significantly. |
| |
|
| | | Scale | HAT Build | HNSW Build | HAT R@10 | HNSW R@10 | |
| | |-------|-----------|------------|----------|-----------| |
| | | 500 | 16ms | 1.0s | **100%** | 55% | |
| | | 1000 | 25ms | 2.0s | **100%** | 44.5% | |
| | | 2000 | 50ms | 4.3s | **100%** | 67.5% | |
| | | 5000 | 127ms | 11.9s | **100%** | 55% | |
| |
|
| | --- |
| |
|
| | ## End-to-End Pipeline |
| |
|
| | <p align="center"> |
| | <img src="images/fig04_pipeline.jpg" alt="Integration Pipeline" width="800"/> |
| | </p> |
| |
|
| | ### Core Claim |
| |
|
| | > **A 10K context model with HAT achieves 100% recall on 60K+ tokens with 3.1ms latency.** |
| |
|
| | | Messages | Tokens | Context % | Recall | Latency | Memory | |
| | |----------|--------|-----------|--------|---------|--------| |
| | | 1000 | 30K | 33% | 100% | 1.7ms | 1.6MB | |
| | | 2000 | 60K | 17% | 100% | 3.1ms | 3.3MB | |
| |
|
| | --- |
| |
|
| | ## Quick Start |
| |
|
| | ### Python |
| |
|
| | ```python |
| | from arms_hat import HatIndex |
| | |
| | # Create index (1536 dimensions for OpenAI embeddings) |
| | index = HatIndex.cosine(1536) |
| | |
| | # Add messages with automatic hierarchy |
| | index.add(embedding) # Returns ID |
| | |
| | # Session/document management |
| | index.new_session() # Start new conversation |
| | index.new_document() # Start new topic |
| | |
| | # Query |
| | results = index.near(query_embedding, k=10) |
| | for result in results: |
| | print(f"ID: {result.id}, Score: {result.score:.4f}") |
| | |
| | # Persistence |
| | index.save("memory.hat") |
| | loaded = HatIndex.load("memory.hat") |
| | ``` |
| |
|
| | ### Rust |
| |
|
| | ```rust |
| | use hat::{HatIndex, HatConfig}; |
| | |
| | // Create index |
| | let config = HatConfig::default(); |
| | let mut index = HatIndex::new(config, 1536); |
| | |
| | // Add points |
| | let id = index.add(&embedding); |
| | |
| | // Query |
| | let results = index.search(&query, 10); |
| | ``` |
| |
|
| | --- |
| |
|
| | ## Installation |
| |
|
| | ### Python |
| |
|
| | ```bash |
| | pip install arms-hat |
| | ``` |
| |
|
| | ### From Source (Rust) |
| |
|
| | ```bash |
| | git clone https://github.com/automate-capture/hat.git |
| | cd hat |
| | cargo build --release |
| | ``` |
| |
|
| | ### Python Development |
| |
|
| | ```bash |
| | cd python |
| | pip install maturin |
| | maturin develop |
| | ``` |
| |
|
| | --- |
| |
|
| | ## Project Structure |
| |
|
| | ``` |
| | hat/ |
| | βββ src/ # Rust implementation |
| | β βββ lib.rs # Library entry point |
| | β βββ index.rs # HatIndex implementation |
| | β βββ container.rs # Tree node types |
| | β βββ consolidation.rs # Background maintenance |
| | β βββ persistence.rs # Save/load functionality |
| | βββ python/ # Python bindings (PyO3) |
| | β βββ arms_hat/ # Python package |
| | βββ benchmarks/ # Performance comparisons |
| | βββ examples/ # Usage examples |
| | βββ paper/ # Research paper (PDF) |
| | βββ images/ # Figures and diagrams |
| | βββ tests/ # Test suite |
| | ``` |
| |
|
| | --- |
| |
|
| | ## Reproducing Results |
| |
|
| | ```bash |
| | # Run HAT vs HNSW benchmark |
| | cargo test --test phase31_hat_vs_hnsw -- --nocapture |
| | |
| | # Run real embedding dimension tests |
| | cargo test --test phase32_real_embeddings -- --nocapture |
| | |
| | # Run persistence tests |
| | cargo test --test phase33_persistence -- --nocapture |
| | |
| | # Run end-to-end LLM demo |
| | python examples/demo_hat_memory.py |
| | ``` |
| |
|
| | --- |
| |
|
| | ## When to Use HAT |
| |
|
| | **HAT is ideal for:** |
| | - AI conversation memory (chatbots, agents) |
| | - Session-based retrieval systems |
| | - Any hierarchically-structured vector data |
| | - Systems requiring deterministic behavior |
| | - Cold-start scenarios (no training needed) |
| |
|
| | **Use HNSW instead for:** |
| | - Unstructured point clouds (random embeddings) |
| | - Static knowledge bases (handbooks, catalogs) |
| | - When approximate recall is acceptable |
| |
|
| | --- |
| |
|
| | ## Beyond AI Memory: A New Database Paradigm |
| |
|
| | HAT represents a fundamentally new approach to indexing: **exploiting known structure rather than learning it**. |
| |
|
| | | Database Type | Structure | Semantics | |
| | |---------------|-----------|-----------| |
| | | Relational | Explicit (foreign keys) | None | |
| | | Document | Implicit (nesting) | None | |
| | | Vector (HNSW) | Learned from data | Yes | |
| | | **HAT** | **Explicit + exploited** | **Yes** | |
| |
|
| | Traditional vector databases treat embeddings as unstructured point clouds, spending compute to *discover* topology. HAT inverts this: **known hierarchy is free information - use it.** |
| |
|
| | ### General Applications |
| |
|
| | Any domain with **hierarchical structure + semantic similarity** benefits from HAT: |
| |
|
| | - **Legal/Medical Documents:** Case β Filing β Paragraph β Sentence |
| | - **Code Search:** Repository β Module β Function β Line |
| | - **IoT/Sensor Networks:** Facility β Zone β Device β Reading |
| | - **E-commerce:** Catalog β Category β Product β Variant |
| | - **Research Corpora:** Journal β Paper β Section β Citation |
| |
|
| | ### The Core Insight |
| |
|
| | > *"Position IS relationship. No foreign keys needed - proximity defines connection."* |
| |
|
| | HAT combines the structural guarantees of document databases with the semantic power of vector search, without the computational overhead of learning topology from scratch. |
| |
|
| | --- |
| |
|
| | ## Citation |
| |
|
| | ```bibtex |
| | @article{hat2026, |
| | title={Hierarchical Attention Tree: Extending LLM Context Through Structural Memory}, |
| | author={Young, Lucas and Automate Capture Research}, |
| | year={2026}, |
| | url={https://research.automate-capture.com/hat} |
| | } |
| | ``` |
| |
|
| | --- |
| |
|
| | ## Paper |
| |
|
| | π **[Read the Full Paper (PDF)](paper/HAT_Context_Extension_Young_2026.pdf)** |
| |
|
| | --- |
| |
|
| | ## License |
| |
|
| | MIT License - see [LICENSE](LICENSE) for details. |
| |
|
| | --- |
| |
|
| | ## Links |
| |
|
| | - **Research Site:** [research.automate-capture.com/hat](https://research.automate-capture.com/hat) |
| | - **Main Site:** [automate-capture.com](https://automate-capture.com) |
| |
|