Hugging Face
Models
Datasets
Spaces
Community
Docs
Enterprise
Pricing
Log In
Sign Up
1
g023dev
g023
Follow
0 followers
·
1 following
g023dev
g023
AI & ML interests
ai datasets, ai training
Recent Activity
reacted
to
OzTianlu
's
post
with 🤗
1 day ago
O(1) inference is the foundational design of Spartacus-1B-Instruct 🛡️ ! https://huggingface.co/NoesisLab/Spartacus-1B-Instruct We have successfully replaced the KV-cache bottleneck inherent in Softmax Attention with Causal Monoid State Compression. By defining the causal history as a monoid recurrence, , the entire prefix is lossily compressed into a fixed-size state matrix per head. The technical core of this architecture relies on the associativity of the monoid operator: Training: parallel prefix scan using Triton-accelerated JIT kernels to compute all prefix states simultaneously. Inference: True sequential updates. Memory and time complexity per token are decoupled from sequence length. Explicit Causality: We discard RoPE and attention masks. Causality is a first-class citizen, explicitly modeled through learned, content-dependent decay gates. Current zero-shot benchmarks demonstrate that Spartacus-1B-Instruct (1.3B) is already outperforming established sub-quadratic models like Mamba-1.4B and RWKV-6-1.6B on ARC-Challenge (0.3063). Recent integration of structured Chain-of-Thought (CoT) data has further pushed reasoning accuracy to 75%. The "Spartacus" era is about scaling intelligence, not the memory wall ♾️.
updated
a model
20 days ago
g023/Qwen3-8B-DMS-8x-4bit-NF4
upvoted
a
paper
21 days ago
Inference-Time Hyper-Scaling with KV Cache Compression
View all activity
Organizations
None yet
models
1
g023/Qwen3-8B-DMS-8x-4bit-NF4
Text Generation
•
8B
•
Updated
20 days ago
•
581
•
1
datasets
1
g023/synth_prompt_reasoning_conclusion_answer
Updated
Mar 7, 2025
•
4