Choppy sound

#14

by acatovic - opened 26 days ago

26 days ago

I got everything up and running using https://github.com/NVIDIA/personaplex but I get choppy sound and the system is unusable. My specs are: 64 GiB RAM, RTX5070 with 12 GiB VRAM. Are the specs sufficient and if so, why do I get "choppiness"?

royrajarshi

NVIDIA org 25 days ago

I will try to reproduce your setup to see why you get the choppiness. Could you verify if your inference is running on GPU or is somehow running on CPU?
Also could you try this suggested installation fix for Blackwell based GPUs:
https://github.com/NVIDIA/personaplex/issues/2

acatovic

25 days ago

Nice, I'll try that fix and get back to you with results and extra info.

acatovic

25 days ago

That didn't help unfortunately, and I can confirm that it's running on the GPU by seeing memory and utilization via nvidia-smi -l 1

acatovic

25 days ago

I tried with offline/evaluation method and that works fine (and confirmed inference is on the GPU), i.e. I ran

python -m moshi.offline --voice-prompt "NATF2.pt" --input-wav "input_assistant.wav" --seed 42424242 --output-wav "output.wav" --output-text "output.json"

See enclosed output wav.

So it's likely something not with the model itself but with the whole streaming (frontend<->server) setup. I previously made my own local voice assistant setup, i.e. ASR->LLM->TTS and recall had some issues with the streaming so landed on a non-dynamic approach (see https://github.com/acatovic/ova).

I will poke around a bit more.

royrajarshi

NVIDIA org 25 days ago

Others are raising this choppiness problem with Blackwell GPUs as well. I am working on reproducing the problem and finding a fix.

staticpressure

25 days ago

Strange, I have been getting torch.OutOfMemoryError on my 5090, even with the offline/evaluation method.

samhillt

25 days ago

Strange, I have been getting torch.OutOfMemoryError on my 5090, even with the offline/evaluation method.

Nvidia updated their github library earlier today to add a lowvram flag to launch. I'm able to run it now on a 5090 with no problem

kartikey9254

22 days ago

I AM ALSO HAVING THE SAME ISSUE WITH MY RTX 5070TI COULD LET US KNOW HOW THIS COULD BE FIXED

jdwestbrook86

21 days ago

The reason you are experiencing choppy sound is likely due to gpu offloading to the cpu. Unless you have high frequency ddr5 memory, performance won't be great. If you don't have cpu offload enabled and you set a max memory cap then you are likely going to hit the out of memory exception.

On my docker container i used TORCHDYNAMO_DISABLE=1 to avoid having to figure out a Triton error I was getting. I'm not sure if using Triton would improve memory.

I set max memory to 0.9 for my rtx 3090 24gb, it allocates ~20 gigs of vram.

On stream initialization it spikes and uses 100% of available memory. If I didn't cap max memory, I would likely also get out of memory exception.

During conversation it can spike to use all available memory but generally speech output seems to take up roughly 18gb of vram.

jizi010907

19 days ago

can my 3080 20g run this model? my memory is ddr4 32gb

jdwestbrook86

18 days ago

Might be ok. Hard to tell, you are right there at the edge. Royrajarshi pushed a change to his github repo that reduces memory use on initialization. The screen captures I shared were before that change. I think you should be ok. There are also some quantization's released that claim to reduce memory to 16gb. I haven't tried the quants.

jizi010907

18 days ago

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment