Testing IQ4_NL
W790E Sage + QYFS + 512G + RTX5090
Tensor blk.61.ffn_down_exps.weight buffer type overriden to CPU
llm_load_tensors: offloading 62 repeating layers to GPU
llm_load_tensors: offloading non-repeating layers to GPU
llm_load_tensors: offloaded 63/63 layers to GPU
llm_load_tensors: CPU buffer size = 120528.00 MiB
llm_load_tensors: CUDA_Host buffer size = 329.70 MiB
llm_load_tensors: CUDA0 buffer size = 3441.36 MiB
....................................................................................................
~ggml_backend_cuda_context: have 0 graphs
===================================== llama_init_from_model: f16
llama_init_from_model: n_ctx = 200192
llama_init_from_model: n_batch = 4096
llama_init_from_model: n_ubatch = 4096
llama_init_from_model: flash_attn = 1
llama_init_from_model: attn_max_b = 512
llama_init_from_model: fused_moe = 1
llama_init_from_model: grouped er = 1
llama_init_from_model: fused_up_gate = 1
llama_init_from_model: fused_mmad = 1
llama_init_from_model: rope_cache = 0
llama_init_from_model: graph_reuse = 1
llama_init_from_model: k_cache_hadam = 0
llama_init_from_model: split_mode_graph_scheduling = 0
llama_init_from_model: reduce_type = f16
llama_init_from_model: sched_async = 0
llama_init_from_model: ser = -1, 0
llama_init_from_model: freq_base = 5000000.0
llama_init_from_model: freq_scale = 1
llama_kv_cache_init: CUDA0 KV buffer size = 19696.65 MiB
llama_init_from_model: KV self size = 19696.62 MiB, K (q6_0): 9848.31 MiB, V (q6_0): 9848.31 MiB
llama_init_from_model: CUDA_Host output buffer size = 0.76 MiB
llama_init_from_model: CUDA0 compute buffer size = 3222.00 MiB
llama_init_from_model: CUDA_Host compute buffer size = 1612.05 MiB
llama_init_from_model: graph nodes = 2361
llama_init_from_model: graph splits = 126
XXXXXXXXXXXXXXXXXXXXX Setting only active experts offload
main: n_kv_max = 200192, n_batch = 4096, n_ubatch = 4096, flash_attn = 1, n_gpu_layers = 99, n_threads = 101, n_threads_batch = 101
| PP | TG | N_KV | T_PP s | S_PP t/s | T_TG s | S_TG t/s |
|---|---|---|---|---|---|---|
| 4096 | 1024 | 0 | 12.206 | 335.58 | 37.231 | 27.50 |
| 4096 | 1024 | 4096 | 12.200 | 335.75 | 30.720 | 33.33 |
| 4096 | 1024 | 8192 | 22.922 | 178.69 | 32.279 | 31.72 |
| 4096 | 1024 | 12288 | 21.965 | 186.48 | 46.059 | 22.23 |
| 4096 | 1024 | 16384 | 23.216 | 176.43 | 47.742 | 21.45 |
| 4096 | 1024 | 20480 | 23.498 | 174.31 | 47.164 | 21.71 |
I love seeing your wild vibe coded creations! Happy lunar new year! ππππ



