Testing IQ4_NL

#12
by shewin - opened

W790E Sage + QYFS + 512G + RTX5090


Tensor blk.61.ffn_down_exps.weight buffer type overriden to CPU
llm_load_tensors: offloading 62 repeating layers to GPU
llm_load_tensors: offloading non-repeating layers to GPU
llm_load_tensors: offloaded 63/63 layers to GPU
llm_load_tensors: CPU buffer size = 120528.00 MiB
llm_load_tensors: CUDA_Host buffer size = 329.70 MiB
llm_load_tensors: CUDA0 buffer size = 3441.36 MiB
....................................................................................................
~ggml_backend_cuda_context: have 0 graphs
===================================== llama_init_from_model: f16
llama_init_from_model: n_ctx = 200192
llama_init_from_model: n_batch = 4096
llama_init_from_model: n_ubatch = 4096
llama_init_from_model: flash_attn = 1
llama_init_from_model: attn_max_b = 512
llama_init_from_model: fused_moe = 1
llama_init_from_model: grouped er = 1
llama_init_from_model: fused_up_gate = 1
llama_init_from_model: fused_mmad = 1
llama_init_from_model: rope_cache = 0
llama_init_from_model: graph_reuse = 1
llama_init_from_model: k_cache_hadam = 0
llama_init_from_model: split_mode_graph_scheduling = 0
llama_init_from_model: reduce_type = f16
llama_init_from_model: sched_async = 0
llama_init_from_model: ser = -1, 0
llama_init_from_model: freq_base = 5000000.0
llama_init_from_model: freq_scale = 1
llama_kv_cache_init: CUDA0 KV buffer size = 19696.65 MiB
llama_init_from_model: KV self size = 19696.62 MiB, K (q6_0): 9848.31 MiB, V (q6_0): 9848.31 MiB
llama_init_from_model: CUDA_Host output buffer size = 0.76 MiB
llama_init_from_model: CUDA0 compute buffer size = 3222.00 MiB
llama_init_from_model: CUDA_Host compute buffer size = 1612.05 MiB
llama_init_from_model: graph nodes = 2361
llama_init_from_model: graph splits = 126
XXXXXXXXXXXXXXXXXXXXX Setting only active experts offload

main: n_kv_max = 200192, n_batch = 4096, n_ubatch = 4096, flash_attn = 1, n_gpu_layers = 99, n_threads = 101, n_threads_batch = 101

PP TG N_KV T_PP s S_PP t/s T_TG s S_TG t/s
4096 1024 0 12.206 335.58 37.231 27.50
4096 1024 4096 12.200 335.75 30.720 33.33
4096 1024 8192 22.922 178.69 32.279 31.72
4096 1024 12288 21.965 186.48 46.059 22.23
4096 1024 16384 23.216 176.43 47.742 21.45
4096 1024 20480 23.498 174.31 47.164 21.71

2026-02-16_01-15

I love seeing your wild vibe coded creations! Happy lunar new year! πŸŒšπŸŽ†πŸŽ‰πŸŽŠ

Retest:

2026-02-16_18-17

2026-02-16_20-05

2026-02-16_21-48

Sign up or log in to comment