nielsr HF Staff commited on
Commit
6e8bf13
·
verified ·
1 Parent(s): f392cab

Add paper link and library_name metadata

Browse files

Hi, I'm Niels from the Hugging Face team.

This model is part of the MOSS-TTS family, which builds upon the CAT architecture described in the recent paper: [MOSS-Audio-Tokenizer: Scaling Audio Tokenizers for Future Audio Foundation Models](https://huggingface.co/papers/2602.10934).

I've opened this PR to:
- Update the Arxiv badge from "Coming soon" to point to the actual technical report.
- Add `library_name: transformers` to the metadata (since it's a custom model compatible with the library).
- Set the `pipeline_tag` to `text-to-speech` for improved discoverability.
- Add a reference to the research paper.

These changes help users find the research context and improve the model's visibility on the Hub.

Files changed (1) hide show
  1. README.md +25 -5
README.md CHANGED
@@ -1,7 +1,4 @@
1
  ---
2
- license: apache-2.0
3
- tags:
4
- - text-to-speech
5
  language:
6
  - zh
7
  - en
@@ -23,7 +20,15 @@ language:
23
  - hu
24
  - el
25
  - tr
 
 
 
 
 
 
 
26
  ---
 
27
  # MOSS-TTS Family
28
 
29
  <br>
@@ -39,7 +44,7 @@ language:
39
  <a href="https://github.com/OpenMOSS/MOSS-TTS/tree/main"><img src="https://img.shields.io/badge/Project%20Page-GitHub-blue"></a>
40
  <a href="https://modelscope.cn/collections/OpenMOSS-Team/MOSS-TTS"><img src="https://img.shields.io/badge/ModelScope-Models-lightgrey?logo=modelscope&amp"></a>
41
  <a href="https://mosi.cn/#models"><img src="https://img.shields.io/badge/Blog-View-blue?logo=internet-explorer&amp"></a>
42
- <a href="https://github.com/OpenMOSS/MOSS-TTS"><img src="https://img.shields.io/badge/Arxiv-Coming%20soon-red?logo=arxiv&amp"></a>
43
 
44
  <a href="https://studio.mosi.cn"><img src="https://img.shields.io/badge/AIStudio-Try-green?logo=internet-explorer&amp"></a>
45
  <a href="https://studio.mosi.cn/docs/moss-tts"><img src="https://img.shields.io/badge/API-Docs-00A3FF?logo=fastapi&amp"></a>
@@ -50,6 +55,9 @@ language:
50
  ## Overview
51
  MOSS‑TTS Family is an open‑source **speech and sound generation model family** from [MOSI.AI](https://mosi.cn/#hero) and the [OpenMOSS team](https://www.open-moss.com/). It is designed for **high‑fidelity**, **high‑expressiveness**, and **complex real‑world scenarios**, covering stable long‑form speech, multi‑speaker dialogue, voice/character design, environmental sound effects, and real‑time streaming TTS.
52
 
 
 
 
53
 
54
  ## Introduction
55
 
@@ -141,4 +149,16 @@ Please refer to the following GitHub repository for detailed usage instructions
141
  👉 **Usage Guide**:
142
  https://github.com/OpenMOSS/MOSS-TTS/blob/main/docs/moss_tts_realtime_model_card.md
143
 
144
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
 
 
 
2
  language:
3
  - zh
4
  - en
 
20
  - hu
21
  - el
22
  - tr
23
+ license: apache-2.0
24
+ pipeline_tag: text-to-speech
25
+ library_name: transformers
26
+ tags:
27
+ - text-to-speech
28
+ - audio
29
+ - moss-tts
30
  ---
31
+
32
  # MOSS-TTS Family
33
 
34
  <br>
 
44
  <a href="https://github.com/OpenMOSS/MOSS-TTS/tree/main"><img src="https://img.shields.io/badge/Project%20Page-GitHub-blue"></a>
45
  <a href="https://modelscope.cn/collections/OpenMOSS-Team/MOSS-TTS"><img src="https://img.shields.io/badge/ModelScope-Models-lightgrey?logo=modelscope&amp"></a>
46
  <a href="https://mosi.cn/#models"><img src="https://img.shields.io/badge/Blog-View-blue?logo=internet-explorer&amp"></a>
47
+ <a href="https://arxiv.org/abs/2602.10934"><img src="https://img.shields.io/badge/Arxiv-2602.10934-red?logo=arxiv&amp"></a>
48
 
49
  <a href="https://studio.mosi.cn"><img src="https://img.shields.io/badge/AIStudio-Try-green?logo=internet-explorer&amp"></a>
50
  <a href="https://studio.mosi.cn/docs/moss-tts"><img src="https://img.shields.io/badge/API-Docs-00A3FF?logo=fastapi&amp"></a>
 
55
  ## Overview
56
  MOSS‑TTS Family is an open‑source **speech and sound generation model family** from [MOSI.AI](https://mosi.cn/#hero) and the [OpenMOSS team](https://www.open-moss.com/). It is designed for **high‑fidelity**, **high‑expressiveness**, and **complex real‑world scenarios**, covering stable long‑form speech, multi‑speaker dialogue, voice/character design, environmental sound effects, and real‑time streaming TTS.
57
 
58
+ ## Paper Information
59
+
60
+ This model is based on the research presented in the paper: **[MOSS-Audio-Tokenizer: Scaling Audio Tokenizers for Future Audio Foundation Models](https://huggingface.co/papers/2602.10934)**.
61
 
62
  ## Introduction
63
 
 
149
  👉 **Usage Guide**:
150
  https://github.com/OpenMOSS/MOSS-TTS/blob/main/docs/moss_tts_realtime_model_card.md
151
 
152
+ ## Citation
153
+ If you use this code or result in your paper, please cite our work as:
154
+ ```tex
155
+ @misc{gong2026mossaudiotokenizerscalingaudiotokenizers,
156
+ title={MOSS-Audio-Tokenizer: Scaling Audio Tokenizers for Future Audio Foundation Models},
157
+ author={Yitian Gong and Kuangwei Chen and Zhaoye Fei and Xiaogui Yang and Ke Chen and Yang Wang and Kexin Huang and Mingshu Chen and Ruixiao Li and Qingyuan Cheng and Shimin Li and Xipeng Qiu},
158
+ year={2026},
159
+ eprint={2602.10934},
160
+ archivePrefix={arXiv},
161
+ primaryClass={cs.SD},
162
+ url={https://arxiv.org/abs/2602.10934},
163
+ }
164
+ ```