Deployment
Can anyone tell me the in detail hardware requirement to host this model on any cloud server for fine tuning, and as for fine tuning can we fine tune this model? and what kind of dataset would be required (audio or text).
It's similar to Moshi's requirements.
See threads for example deployments:
https://www.linkedin.com/pulse/running-j-moshi-gpu-machine-bhargav-shah-cn6mc/
https://www.reddit.com/r/LocalLLaMA/comments/1fjwc4l/kyutai_labs_open_source_moshi_endtoend_speech_to/
But does it like moshi run on MLX yet ? if not are there any plans to make it compatible ? (just can't afford a nvidia/cuda gpu rn ahha) love the work you guys did with this and i would love to try an implement it in some personal projects
MLX Compatibility Status
Hi @pwidenfels ,
Currently, PersonaPlex does not officially support MLX (Apple Silicon). The model relies heavily on PyTorch + CUDA for real-time audio streaming inference.
Why MLX is challenging for this model:
- Architecture Complexity: PersonaPlex is built on Moshi which uses a complex multi-stream audio tokenizer (Mimi) + a 7B parameter LM - both optimized for CUDA
- Real-time Requirements: The streaming inference requires very low latency (~80ms per audio frame), which needs careful optimization per platform
- Mimi Codec: The audio encoder/decoder hasn't been ported to MLX yet
Current Options for Non-NVIDIA users:
- Cloud GPU: Use services like RunPod, Vast.ai, or Lambda Labs with NVIDIA GPUs (~$0.30-0.50/hr for RTX 3090/4090)
- Google Colab Pro: T4/A100 access for experimentation
- CPU Offload: The model supports
--lowvramflag which offloads the LM to CPU (works but with higher latency)
For the NVIDIA team:
If there's community interest, an MLX port would require:
- Porting the Mimi encoder/decoder
- Adapting the streaming LM generation loop
- Testing real-time audio latency on M-series chips
Hope this helps! Feel free to ask if you need help with cloud deployment.
Yesss MLX support would be amazing !!
Hi, why do I keep getting this error?
Traceback (most recent call last):
File "", line 198, in _run_module_as_main
File "", line 88, in _run_code
File "/home/administrator/PERSONAPLEX/venv/lib/python3.13/site-packages/moshi/server.py", line 287, in
main()
~~~~^^
File "/home/administrator/PERSONAPLEX/venv/lib/python3.13/site-packages/moshi/server.py", line 227, in main
mimi = checkpoint_info.get_mimi(device=args.device)
File "/home/administrator/PERSONAPLEX/venv/lib/python3.13/site-packages/moshi/models/loaders.py", line 284, in get_mimi
num_codebooks = max(self.lm_config["dep_q"], self.lm_config["n_q"] - self.lm_config["dep_q"])
~~~~~~~~~~~~~~^^^^^^^^^
KeyError: 'dep_q'
if I try to run python -m moshi.server --hf-repo "nvidia/personaplex-7b-v1"
personaplex doesn't run natively on the moshi server. The architecture is very similar but the audio stack is a little different.
Do you have access to a nvidia GPU ? if so running the steps in the official repo should work properly.
https://huggingface.co/eastlondoner/personaplex-mlx/tree/main i found this one but did not test it yet @Prateektg