g023dev's picture

1

g023dev

g023

·

g023dev
g023

AI & ML interests

ai datasets, ai training

Recent Activity

reacted to OzTianlu's post with 🤗 1 day ago

O(1) inference is the foundational design of Spartacus-1B-Instruct 🛡️ ! https://huggingface.co/NoesisLab/Spartacus-1B-Instruct We have successfully replaced the KV-cache bottleneck inherent in Softmax Attention with Causal Monoid State Compression. By defining the causal history as a monoid recurrence, , the entire prefix is lossily compressed into a fixed-size state matrix per head. The technical core of this architecture relies on the associativity of the monoid operator: Training: parallel prefix scan using Triton-accelerated JIT kernels to compute all prefix states simultaneously. Inference: True sequential updates. Memory and time complexity per token are decoupled from sequence length. Explicit Causality: We discard RoPE and attention masks. Causality is a first-class citizen, explicitly modeled through learned, content-dependent decay gates. Current zero-shot benchmarks demonstrate that Spartacus-1B-Instruct (1.3B) is already outperforming established sub-quadratic models like Mamba-1.4B and RWKV-6-1.6B on ARC-Challenge (0.3063). Recent integration of structured Chain-of-Thought (CoT) data has further pushed reasoning accuracy to 75%. The "Spartacus" era is about scaling intelligence, not the memory wall ♾️.

updated a model 20 days ago

g023/Qwen3-8B-DMS-8x-4bit-NF4

upvoted a paper 21 days ago

Inference-Time Hyper-Scaling with KV Cache Compression

View all activity

Organizations

None yet

models 1

g023/Qwen3-8B-DMS-8x-4bit-NF4

Text Generation • 8B • Updated 20 days ago • 581 • 1

datasets 1

g023/synth_prompt_reasoning_conclusion_answer

Updated Mar 7, 2025 • 4