update readme
Browse files
README.md
CHANGED
|
@@ -27,12 +27,12 @@ When a single piece of audio needs to **sound like a real person**, **pronounce
|
|
| 27 |
|
| 28 |
| Model | Architecture | Size | Model Card | Hugging Face |
|
| 29 |
|---|---|---:|---|---|
|
| 30 |
-
| **MOSS-TTS** | MossTTSDelay | 8B | [moss_tts_model_card.md](https://github.com/OpenMOSS/MOSS-TTS/blob/main/moss_tts_model_card.md) | 🤗 [Huggingface](https://huggingface.co/OpenMOSS-Team/MOSS-TTS) |
|
| 31 |
-
| | MossTTSLocal | 1.7B | [moss_tts_model_card.md](https://github.com/OpenMOSS/MOSS-TTS/blob/main/moss_tts_model_card.md) | 🤗 [Huggingface](https://huggingface.co/OpenMOSS-Team/MOSS-TTS-Local-Transformer) |
|
| 32 |
-
| **MOSS‑TTSD‑V1.0** | MossTTSDelay | 8B | [moss_ttsd_model_card.md](https://github.com/OpenMOSS/MOSS-TTS/blob/main/moss_ttsd_model_card.md) | 🤗 [Huggingface](https://huggingface.co/OpenMOSS-Team/MOSS-TTSD-v1.0) |
|
| 33 |
-
| **MOSS‑VoiceGenerator** | MossTTSDelay | 1.7B | [moss_voice_generator_model_card.md](https://github.com/OpenMOSS/MOSS-TTS/blob/main/moss_voice_generator_model_card.md) | 🤗 [Huggingface](https://huggingface.co/OpenMOSS-Team/MOSS-Voice-Generator) |
|
| 34 |
-
| **MOSS‑SoundEffect** | MossTTSDelay | 8B | [moss_sound_effect_model_card.md](https://github.com/OpenMOSS/MOSS-TTS/blob/main/moss_sound_effect_model_card.md) | 🤗 [Huggingface](https://huggingface.co/OpenMOSS-Team/MOSS-SoundEffect) |
|
| 35 |
-
| **MOSS‑TTS‑Realtime** | MossTTSRealtime | 1.7B | [moss_tts_realtime_model_card.md](https://github.com/OpenMOSS/MOSS-TTS/blob/main/moss_tts_realtime_model_card.md) | 🤗 [Huggingface](https://huggingface.co/OpenMOSS-Team/MOSS-TTS-Realtime) |
|
| 36 |
|
| 37 |
|
| 38 |
|
|
@@ -112,6 +112,48 @@ For full details, see:
|
|
| 112 |
|
| 113 |
## 2. Quick Start
|
| 114 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 115 |
> Tip: For evaluation and research purposes, we recommend using **MOSS-TTSLocal-1.7B**.
|
| 116 |
|
| 117 |
MOSS-TTS provides a convenient `generate` interface for rapid usage. The examples below cover:
|
|
|
|
| 27 |
|
| 28 |
| Model | Architecture | Size | Model Card | Hugging Face |
|
| 29 |
|---|---|---:|---|---|
|
| 30 |
+
| **MOSS-TTS** | MossTTSDelay | 8B | [moss_tts_model_card.md](https://github.com/OpenMOSS/MOSS-TTS/blob/main/docs/moss_tts_model_card.md) | 🤗 [Huggingface](https://huggingface.co/OpenMOSS-Team/MOSS-TTS) |
|
| 31 |
+
| | MossTTSLocal | 1.7B | [moss_tts_model_card.md](https://github.com/OpenMOSS/MOSS-TTS/blob/main/docs/moss_tts_model_card.md) | 🤗 [Huggingface](https://huggingface.co/OpenMOSS-Team/MOSS-TTS-Local-Transformer) |
|
| 32 |
+
| **MOSS‑TTSD‑V1.0** | MossTTSDelay | 8B | [moss_ttsd_model_card.md](https://github.com/OpenMOSS/MOSS-TTS/blob/main/docs/moss_ttsd_model_card.md) | 🤗 [Huggingface](https://huggingface.co/OpenMOSS-Team/MOSS-TTSD-v1.0) |
|
| 33 |
+
| **MOSS‑VoiceGenerator** | MossTTSDelay | 1.7B | [moss_voice_generator_model_card.md](https://github.com/OpenMOSS/MOSS-TTS/blob/main/docs/moss_voice_generator_model_card.md) | 🤗 [Huggingface](https://huggingface.co/OpenMOSS-Team/MOSS-Voice-Generator) |
|
| 34 |
+
| **MOSS‑SoundEffect** | MossTTSDelay | 8B | [moss_sound_effect_model_card.md](https://github.com/OpenMOSS/MOSS-TTS/blob/main/docs/moss_sound_effect_model_card.md) | 🤗 [Huggingface](https://huggingface.co/OpenMOSS-Team/MOSS-SoundEffect) |
|
| 35 |
+
| **MOSS‑TTS‑Realtime** | MossTTSRealtime | 1.7B | [moss_tts_realtime_model_card.md](https://github.com/OpenMOSS/MOSS-TTS/blob/main/docs/moss_tts_realtime_model_card.md) | 🤗 [Huggingface](https://huggingface.co/OpenMOSS-Team/MOSS-TTS-Realtime) |
|
| 36 |
|
| 37 |
|
| 38 |
|
|
|
|
| 112 |
|
| 113 |
## 2. Quick Start
|
| 114 |
|
| 115 |
+
|
| 116 |
+
|
| 117 |
+
### Environment Setup
|
| 118 |
+
|
| 119 |
+
We recommend a clean, isolated Python environment with **Transformers 5.0.0** to avoid dependency conflicts.
|
| 120 |
+
|
| 121 |
+
```bash
|
| 122 |
+
conda create -n moss-tts python=3.12 -y
|
| 123 |
+
conda activate moss-tts
|
| 124 |
+
```
|
| 125 |
+
|
| 126 |
+
Install all required dependencies:
|
| 127 |
+
|
| 128 |
+
```bash
|
| 129 |
+
git clone https://github.com/OpenMOSS/MOSS-TTS.git
|
| 130 |
+
cd MOSS-TTS
|
| 131 |
+
pip install --extra-index-url https://download.pytorch.org/whl/cu128 -e .
|
| 132 |
+
```
|
| 133 |
+
|
| 134 |
+
#### (Optional) Install FlashAttention 2
|
| 135 |
+
|
| 136 |
+
For better speed and lower GPU memory usage, you can install FlashAttention 2 if your hardware supports it.
|
| 137 |
+
|
| 138 |
+
```bash
|
| 139 |
+
pip install --extra-index-url https://download.pytorch.org/whl/cu128 -e ".[flash-attn]"
|
| 140 |
+
```
|
| 141 |
+
|
| 142 |
+
If your machine has limited RAM and many CPU cores, you can cap build parallelism:
|
| 143 |
+
|
| 144 |
+
```bash
|
| 145 |
+
MAX_JOBS=4 pip install --extra-index-url https://download.pytorch.org/whl/cu128 -e ".[flash-attn]"
|
| 146 |
+
```
|
| 147 |
+
|
| 148 |
+
Notes:
|
| 149 |
+
- Dependencies are managed in `pyproject.toml`, which currently pins `torch==2.9.1+cu128` and `torchaudio==2.9.1+cu128`.
|
| 150 |
+
- If FlashAttention 2 fails to build on your machine, you can skip it and use the default attention backend.
|
| 151 |
+
- FlashAttention 2 is only available on supported GPUs and is typically used with `torch.float16` or `torch.bfloat16`.
|
| 152 |
+
|
| 153 |
+
|
| 154 |
+
### Basic Usage
|
| 155 |
+
|
| 156 |
+
|
| 157 |
> Tip: For evaluation and research purposes, we recommend using **MOSS-TTSLocal-1.7B**.
|
| 158 |
|
| 159 |
MOSS-TTS provides a convenient `generate` interface for rapid usage. The examples below cover:
|