Thanks for your effort
#5
by
darkstar3537
- opened
Appreciate you uploading this. It seems very capable, but I'm seeing thought looping endlessly using latest sglang and recipe found in other thread on here. Any ideas? Using 2 x RTX 6000 Pro cards
Have you tried with BF16 KV cache?
Yep, still eventually ends up looping and going off the rails
Maybe try playing with temperature, top_k/p and repetition_penalty?
Yes I will try that and report back. Thanks
let me know! apparently there is a bug in vllm but I dont see how...