ik_llama.cpp version

#11
by geveent - opened

MiniMax M2.5 IQ5_K ran fine with my AMD 9600X + RTX 5090 setup. I got about 9 t/s. Then it wasn't usable (i.e., repeating words and very slow) after I updated ik_llama.cpp. I had to roll back to f7923739 (build 4081), and everything was back to normal. I just wanted to share my experience.

@geveent

Heya, good seeing you around!

Can you provide the full command you were using for testing? Also strangely, the smaller IQ4_NL version might be slightly better (it shows better perplexity, but I didn' test KLD stats). You could use the ik version which is a little better than the mainline version. (the mainline version is mainly for vulkan/mac/mainline folks).

If I understand correctly, it works fine until getting past a certain context length? What client were you using (the built in web ui, or an agentic coding tool like opencode etc?).

Cheers!

Sign up or log in to comment