ik_llama.cpp version

#11

by geveent - opened 2 days ago

2 days ago

MiniMax M2.5 IQ5_K ran fine with my AMD 9600X + RTX 5090 setup. I got about 9 t/s. Then it wasn't usable (i.e., repeating words and very slow) after I updated ik_llama.cpp. I had to roll back to f7923739 (build 4081), and everything was back to normal. I just wanted to share my experience.

ubergarm

Owner 2 days ago

@geveent

Heya, good seeing you around!

Can you provide the full command you were using for testing? Also strangely, the smaller IQ4_NL version might be slightly better (it shows better perplexity, but I didn' test KLD stats). You could use the ik version which is a little better than the mainline version. (the mainline version is mainly for vulkan/mac/mainline folks).

If I understand correctly, it works fine until getting past a certain context length? What client were you using (the built in web ui, or an agentic coding tool like opencode etc?).

Cheers!

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment