4.6?

by sousekd - opened Oct 6, 2025

Oct 6, 2025

I do prefer using your imatrix-free quants for all large models.
Do you plan on making GLM-4.6 everyone seems so excited about?

I haven't heard much comparison between DeepSeek and GLM, which is quite understandable given DeepSeek's size, but I know you liked DeepSeek-v3.1 a lot, so I'd be interested to hear your opinion. Did you have a chance to try GLM-4.6? Did you make your own quants? Or do you just run it in Q8_0 or BF16?

So many questions :).
I'm slowly catching up, still downloading your other older quants...

Thank you.

anikifoss

Owner Oct 6, 2025

Hey @sousekd ! I downloaded GLM-4.6 and converted it to GGUF, but haven't gotter around to quantizing it yet. I've been having too much fun writing a minecraft clone with Deepseek-V3.1-Terminus.

Since there is demand, I'll run it overnight and post it sometime tomorrow.

sousekd

Oct 6, 2025

Oh, thank you!
I read it as "hell yeah, 3.1 Terminus is goooood!" :)

anikifoss

Owner Oct 7, 2025

•

edited Oct 7, 2025

I read it as "hell yeah, 3.1 Terminus is goooood!" :)

Yeah, totally! As of Deepseek-V3.1 release, all the coding tasks I wanted to do started magically working. And then Deepseek-V3.1-Terminus one-shotted code for a working WebGl-based minecraft clone after some manual pre-planning. None of the older models could get the 3D math and mesh chunking algorithms right, it was always a black screen or broken 3d models with incorrect texturing. But now Deepseek-V3.1-Terminus is spitting out fully functional games prototypes on demand.

In the Deepseek V3.1 release announcement (see " long context extension approach"), they mentioned more training for extended contexts. I think that made a huge difference, because now the model is able to work with larger codebase past 32k tokens, and even past 64k tokens. While previous models would degrade at 32k and become unusable after 64k tokens.

sousekd

Oct 7, 2025

Sounds great. Do you use your HQ4_K quants, or something larger, to get those promising results?
Also, did you manage to make ik_llama handle tool calling correctly? I failed to get it to work last time I tried - so I fallback to llama.cpp when I need tool calling.
Sorry for all the questions, I haven't tried to use LLMs for the "serious" code generation yet, just small things here and there :).

anikifoss

Owner Oct 7, 2025

I use HQ4_K quants exclusively.

I find both llama.cpp and ik_llama.cpp can be buggy in different ways, and it changes every month as old bugs are fixed and new features are added. I'm usually switching between them depending on the model, task, and runtime issues.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment