Mitko Vasilev's picture

Mitko Vasilev

mitkox

·

AI & ML interests

Make sure you own your AI. AI in the cloud is not aligned with you; it's aligned with the company that owns it.

Recent Activity

posted an update about 6 hours ago

Got to 1199.8 tokens/sec with Devstral Small -2 on my desktop GPU workstation. vLLM nightly. Works out of the box with Mistral Vibe. Next is time to test the big one.

posted an update 17 days ago

I run 20 AI coding agents locally on my desktop workstation at 400+ tokens/sec with MiniMax-M2. It’s a Sonnet drop-in replacement in my Cursor, Claude Code, Droid, Kilo and Cline peak at 11k tok/sec input and 433 tok/s output, can generate 1B+ tok/m.All with 196k context window. I'm running it for 6 days now with this config. Today max performance was stable at 490.2 tokens/sec across 48 concurrent clients and MiniMax M2. Z8 Fury G5, Xeon 3455, 4xA6K. Aibrix 0.5.0, vLLM 0.11.2,

posted an update about 1 month ago

I just threw Qwen3-0.6B in BF16 into an on device AI drag race on AMD Strix Halo with vLLM: 564 tokens/sec on short 100-token sprints 96 tokens/sec on 8K-token marathons TL;DR You don't just run AI on AMD. You negotiate with it. The hardware absolutely delivers. Spoiler alert; there is exactly ONE configuration where vLLM + ROCm + Triton + PyTorch + Drivers + Ubuntu Kernel to work at the same time. Finding it required the patience of a saint Consumer AMD for AI inference is the ultimate "budget warrior" play, insane performance-per-euro, but you need hardcore technical skills that would make a senior sysadmin nod in quiet respect.

View all activity

Organizations

New activity in open-acc/README about 1 year ago

Bye Apple and hi NVIDIA

#6 opened about 1 year ago by

New activity in mitkox/WhiteRabbitNeo-2.5-Qwen-2.5-Coder-7B-mlx about 1 year ago

Upload folder using huggingface_hub

#1 opened about 1 year ago by

New activity in microsoft/kosmos-2.5 over 1 year ago

Apply for community grant: Academic project

#1 opened over 1 year ago by

New activity in google/gemma-7b almost 2 years ago

How long does this approval process take?

#10 opened almost 2 years ago by

New activity in TheBloke/WhiteRabbitNeo-33B-v1-GGUF almost 2 years ago

Not able to run this model?

#1 opened almost 2 years ago by