Thanks for your effort

#5
by darkstar3537 - opened

Appreciate you uploading this. It seems very capable, but I'm seeing thought looping endlessly using latest sglang and recipe found in other thread on here. Any ideas? Using 2 x RTX 6000 Pro cards

Have you tried with BF16 KV cache?

Yep, still eventually ends up looping and going off the rails

Maybe try playing with temperature, top_k/p and repetition_penalty?

Yes I will try that and report back. Thanks

let me know! apparently there is a bug in vllm but I dont see how...

Sign up or log in to comment