VLLM error for kv weight scaling - workaround
3
#6 opened about 10 hours ago
by
ShaunEvansMD
Thanks for your effort
5
#5 opened about 17 hours ago
by
darkstar3537
fp8 kv cache
9
#4 opened 1 day ago
by
festr2
KeyError: '110.w1.input_scale' with TRT
2
#3 opened 2 days ago
by
guanwenyu1995
"w1_weight_scale_2 must match w3_weight_scale_2. Accuracy may be affected."
👍
1
14
#2 opened 3 days ago
by
zenmagnets
Here's the vLLM recipe I'm using with 2x RTX Pro 6000
👍
2
10
#1 opened 3 days ago
by
zenmagnets