Cqy2019 nielsr HF Staff commited on
Commit
c7cd852
·
1 Parent(s): 1167e01

Improve model card: add library_name, pipeline_tag and update paper link (#1)

Browse files

- Improve model card: add library_name, pipeline_tag and update paper link (c37bee83d3b5823d8c8eed152ea7481acfdb6bb3)


Co-authored-by: Niels Rogge <nielsr@users.noreply.huggingface.co>

Files changed (1) hide show
  1. README.md +26 -5
README.md CHANGED
@@ -1,7 +1,4 @@
1
  ---
2
- license: apache-2.0
3
- tags:
4
- - text-to-speech
5
  language:
6
  - zh
7
  - en
@@ -23,7 +20,13 @@ language:
23
  - hu
24
  - el
25
  - tr
 
 
 
 
 
26
  ---
 
27
  # MOSS-TTS Family
28
 
29
  <br>
@@ -39,7 +42,7 @@ language:
39
  <a href="https://github.com/OpenMOSS/MOSS-TTS/tree/main"><img src="https://img.shields.io/badge/Project%20Page-GitHub-blue"></a>
40
  <a href="https://modelscope.cn/collections/OpenMOSS-Team/MOSS-TTS"><img src="https://img.shields.io/badge/ModelScope-Models-lightgrey?logo=modelscope&amp"></a>
41
  <a href="https://mosi.cn/#models"><img src="https://img.shields.io/badge/Blog-View-blue?logo=internet-explorer&amp"></a>
42
- <a href="https://github.com/OpenMOSS/MOSS-TTS"><img src="https://img.shields.io/badge/Arxiv-Coming%20soon-red?logo=arxiv&amp"></a>
43
 
44
  <a href="https://studio.mosi.cn"><img src="https://img.shields.io/badge/AIStudio-Try-green?logo=internet-explorer&amp"></a>
45
  <a href="https://studio.mosi.cn/docs/moss-tts"><img src="https://img.shields.io/badge/API-Docs-00A3FF?logo=fastapi&amp"></a>
@@ -50,6 +53,8 @@ language:
50
  ## Overview
51
  MOSS‑TTS Family is an open‑source **speech and sound generation model family** from [MOSI.AI](https://mosi.cn/#hero) and the [OpenMOSS team](https://www.open-moss.com/). It is designed for **high‑fidelity**, **high‑expressiveness**, and **complex real‑world scenarios**, covering stable long‑form speech, multi‑speaker dialogue, voice/character design, environmental sound effects, and real‑time streaming TTS.
52
 
 
 
53
 
54
  ## Introduction
55
 
@@ -386,4 +391,20 @@ For open-source models, annotators are asked to score each sample pair in terms
386
  For closed-source models, annotators are only asked to choose the overall preferred one in each pair, and we compute the win rate accordingly.
387
  <p align="center">
388
  <img src="https://speech-demo.oss-cn-shanghai.aliyuncs.com/moss_tts_demo/tts_readme_imgaes_demo/moss_ttsd_winrate" width="100%" />
389
- </p>
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
 
 
 
2
  language:
3
  - zh
4
  - en
 
20
  - hu
21
  - el
22
  - tr
23
+ license: apache-2.0
24
+ library_name: transformers
25
+ pipeline_tag: text-to-speech
26
+ tags:
27
+ - text-to-speech
28
  ---
29
+
30
  # MOSS-TTS Family
31
 
32
  <br>
 
42
  <a href="https://github.com/OpenMOSS/MOSS-TTS/tree/main"><img src="https://img.shields.io/badge/Project%20Page-GitHub-blue"></a>
43
  <a href="https://modelscope.cn/collections/OpenMOSS-Team/MOSS-TTS"><img src="https://img.shields.io/badge/ModelScope-Models-lightgrey?logo=modelscope&amp"></a>
44
  <a href="https://mosi.cn/#models"><img src="https://img.shields.io/badge/Blog-View-blue?logo=internet-explorer&amp"></a>
45
+ <a href="https://huggingface.co/papers/2602.10934"><img src="https://img.shields.io/badge/Arxiv-2602.10934-red?logo=arxiv&amp"></a>
46
 
47
  <a href="https://studio.mosi.cn"><img src="https://img.shields.io/badge/AIStudio-Try-green?logo=internet-explorer&amp"></a>
48
  <a href="https://studio.mosi.cn/docs/moss-tts"><img src="https://img.shields.io/badge/API-Docs-00A3FF?logo=fastapi&amp"></a>
 
53
  ## Overview
54
  MOSS‑TTS Family is an open‑source **speech and sound generation model family** from [MOSI.AI](https://mosi.cn/#hero) and the [OpenMOSS team](https://www.open-moss.com/). It is designed for **high‑fidelity**, **high‑expressiveness**, and **complex real‑world scenarios**, covering stable long‑form speech, multi‑speaker dialogue, voice/character design, environmental sound effects, and real‑time streaming TTS.
55
 
56
+ The model architecture and tokenizer are detailed in the paper [MOSS-Audio-Tokenizer: Scaling Audio Tokenizers for Future Audio Foundation Models](https://huggingface.co/papers/2602.10934).
57
+
58
 
59
  ## Introduction
60
 
 
391
  For closed-source models, annotators are only asked to choose the overall preferred one in each pair, and we compute the win rate accordingly.
392
  <p align="center">
393
  <img src="https://speech-demo.oss-cn-shanghai.aliyuncs.com/moss_tts_demo/tts_readme_imgaes_demo/moss_ttsd_winrate" width="100%" />
394
+ </p>
395
+
396
+ ## Citation
397
+
398
+ If you use this code or result in your paper, please cite our work as:
399
+
400
+ ```bibtex
401
+ @misc{gong2026mossaudiotokenizerscalingaudiotokenizers,
402
+ title={MOSS-Audio-Tokenizer: Scaling Audio Tokenizers for Future Audio Foundation Models},
403
+ author={Yitian Gong and Kuangwei Chen and Zhaoye Fei and Xiaogui Yang and Ke Chen and Yang Wang and Kexin Huang and Mingshu Chen and Ruixiao Li and Qingyuan Cheng and Shimin Li and Xipeng Qiu},
404
+ year={2026},
405
+ eprint={2602.10934},
406
+ archivePrefix={arXiv},
407
+ primaryClass={cs.SD},
408
+ url={https://arxiv.org/abs/2602.10934},
409
+ }
410
+ ```