Spaces:
Running
on
A100
Running
on
A100
Merge pull request #17 from ace-step/add_model_zoo
Browse files- README.md +24 -1
- acestep/acestep_v15_pipeline.py +1 -1
- acestep/api_server.py +4 -4
- assets/model_zoo.png +3 -0
- generate_examples.py +2 -2
- profile_inference.py +2 -2
README.md
CHANGED
|
@@ -19,6 +19,8 @@
|
|
| 19 |
- [📦 Installation](#-installation)
|
| 20 |
- [🚀 Usage](#-usage)
|
| 21 |
- [🔨 Train](#-train)
|
|
|
|
|
|
|
| 22 |
|
| 23 |
## 📝 Abstract
|
| 24 |
We present ACE-Step v1.5, a highly efficient foundation model that democratizes commercial-grade music production on consumer hardware. Optimized for local deployment (<4GB VRAM), the model accelerates generation by over 100× compared to traditional pure LM architectures, producing superior high-fidelity audio in seconds characterized by coherent semantics and exceptional melodies. At its core lies a novel hybrid architecture where the Language Model (LM) functions as an omni-capable planner: it transforms simple user queries into comprehensive song blueprints—scaling from short loops to 10-minute compositions—while synthesizing metadata, lyrics, and captions via Chain-of-Thought to guide the Diffusion Transformer (DiT). Uniquely, this alignment is achieved through intrinsic reinforcement learning relying solely on the model’s internal mechanisms, thereby eliminating the biases inherent in external reward models or human preferences. Beyond standard synthesis, ACE-Step v1.5 unifies precise stylistic control with versatile editing capabilities—such as cover generation, repainting, and vocal-to-BGM conversion—while maintaining strict adherence to prompts across 50+ languages.
|
|
@@ -31,7 +33,7 @@ We present ACE-Step v1.5, a highly efficient foundation model that democratizes
|
|
| 31 |
</p>
|
| 32 |
|
| 33 |
### ⚡ Performance
|
| 34 |
-
- ✅ **Ultra-Fast Generation** — 0.5s to 10s generation time (depending on think mode & diffusion steps)
|
| 35 |
- ✅ **Flexible Duration** — Supports 10 seconds to 10 minutes (600s) audio generation
|
| 36 |
- ✅ **Batch Generation** — Generate up to 8 songs simultaneously
|
| 37 |
|
|
@@ -159,7 +161,28 @@ See the **LoRA Training** tab in Gradio UI for one-click training, or check [Gra
|
|
| 159 |
<img src="./assets/ACE-Step_framework.png" width="100%" alt="ACE-Step Framework">
|
| 160 |
</p>
|
| 161 |
|
|
|
|
| 162 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 163 |
|
| 164 |
## 📜 License & Disclaimer
|
| 165 |
|
|
|
|
| 19 |
- [📦 Installation](#-installation)
|
| 20 |
- [🚀 Usage](#-usage)
|
| 21 |
- [🔨 Train](#-train)
|
| 22 |
+
- [🏗️ Architecture](#️-architecture)
|
| 23 |
+
- [🦁 Model Zoo](#-model-zoo)
|
| 24 |
|
| 25 |
## 📝 Abstract
|
| 26 |
We present ACE-Step v1.5, a highly efficient foundation model that democratizes commercial-grade music production on consumer hardware. Optimized for local deployment (<4GB VRAM), the model accelerates generation by over 100× compared to traditional pure LM architectures, producing superior high-fidelity audio in seconds characterized by coherent semantics and exceptional melodies. At its core lies a novel hybrid architecture where the Language Model (LM) functions as an omni-capable planner: it transforms simple user queries into comprehensive song blueprints—scaling from short loops to 10-minute compositions—while synthesizing metadata, lyrics, and captions via Chain-of-Thought to guide the Diffusion Transformer (DiT). Uniquely, this alignment is achieved through intrinsic reinforcement learning relying solely on the model’s internal mechanisms, thereby eliminating the biases inherent in external reward models or human preferences. Beyond standard synthesis, ACE-Step v1.5 unifies precise stylistic control with versatile editing capabilities—such as cover generation, repainting, and vocal-to-BGM conversion—while maintaining strict adherence to prompts across 50+ languages.
|
|
|
|
| 33 |
</p>
|
| 34 |
|
| 35 |
### ⚡ Performance
|
| 36 |
+
- ✅ **Ultra-Fast Generation** — 0.5s to 10s generation time on A100 (depending on think mode & diffusion steps)
|
| 37 |
- ✅ **Flexible Duration** — Supports 10 seconds to 10 minutes (600s) audio generation
|
| 38 |
- ✅ **Batch Generation** — Generate up to 8 songs simultaneously
|
| 39 |
|
|
|
|
| 161 |
<img src="./assets/ACE-Step_framework.png" width="100%" alt="ACE-Step Framework">
|
| 162 |
</p>
|
| 163 |
|
| 164 |
+
## 🦁 Model Zoo
|
| 165 |
|
| 166 |
+
<p align="center">
|
| 167 |
+
<img src="./assets/model_zoo.png" width="100%" alt="Model Zoo">
|
| 168 |
+
</p>
|
| 169 |
+
|
| 170 |
+
### DiT Models
|
| 171 |
+
|
| 172 |
+
| DiT Model | Pre-Training | SFT | RL | CFG | Step | Refer audio | Text2Music | Cover | Repaint | Extract | Lego | Complete | Quality | Diversity | Fine-Tunability | Hugging Face |
|
| 173 |
+
|-----------|:------------:|:---:|:--:|:---:|:----:|:-----------:|:----------:|:-----:|:-------:|:-------:|:----:|:--------:|:-------:|:---------:|:---------------:|--------------|
|
| 174 |
+
| `acestep-v15-base` | ✅ | ❌ | ❌ | ✅ | 50 | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | Medium | High | Easy | [Link](https://huggingface.co/ACE-Step/acestep-v15-base) |
|
| 175 |
+
| `acestep-v15-sft` | ✅ | ✅ | ❌ | ✅ | 50 | ✅ | ✅ | ✅ | ✅ | ❌ | ❌ | ❌ | High | Medium | Easy | [Link](https://huggingface.co/ACE-Step/acestep-v15-sft) |
|
| 176 |
+
| `acestep-v15-turbo` | ✅ | ✅ | ❌ | ❌ | 8 | ✅ | ✅ | ✅ | ✅ | ❌ | ❌ | ❌ | Very High | Medium | Medium | [Link](https://huggingface.co/ACE-Step/Ace-Step1.5) |
|
| 177 |
+
| `acestep-v15-turbo-rl` | ✅ | ✅ | ✅ | ❌ | 8 | ✅ | ✅ | ✅ | ✅ | ❌ | ❌ | ❌ | Very High | Medium | Medium | To be released |
|
| 178 |
+
|
| 179 |
+
### LM Models
|
| 180 |
+
|
| 181 |
+
| LM Model | Pretrain from | Pre-Training | SFT | RL | CoT metas | Query rewrite | Audio Understanding | Composition Capability | Copy Melody | Hugging Face |
|
| 182 |
+
|----------|---------------|:------------:|:---:|:--:|:---------:|:-------------:|:-------------------:|:----------------------:|:-----------:|--------------|
|
| 183 |
+
| `acestep-5Hz-lm-0.6B` | Qwen3-0.6B | ✅ | ✅ | ✅ | ✅ | ✅ | Medium | Medium | Weak | ✅ |
|
| 184 |
+
| `acestep-5Hz-lm-1.7B` | Qwen3-1.7B | ✅ | ✅ | ✅ | ✅ | ✅ | Medium | Medium | Medium | ✅ |
|
| 185 |
+
| `acestep-5Hz-lm-4B` | Qwen3-4B | ✅ | ✅ | ✅ | ✅ | ✅ | Strong | Strong | Strong | To be released |
|
| 186 |
|
| 187 |
## 📜 License & Disclaimer
|
| 188 |
|
acestep/acestep_v15_pipeline.py
CHANGED
|
@@ -129,7 +129,7 @@ def main():
|
|
| 129 |
# Service initialization arguments
|
| 130 |
parser.add_argument("--init_service", type=lambda x: x.lower() in ['true', '1', 'yes'], default=False, help="Initialize service on startup (default: False)")
|
| 131 |
parser.add_argument("--checkpoint", type=str, default=None, help="Checkpoint file path (optional, for display purposes)")
|
| 132 |
-
parser.add_argument("--config_path", type=str, default=None, help="Main model path (e.g., 'acestep-v15-turbo
|
| 133 |
parser.add_argument("--device", type=str, default="auto", choices=["auto", "cuda", "cpu"], help="Processing device (default: auto)")
|
| 134 |
parser.add_argument("--init_llm", type=lambda x: x.lower() in ['true', '1', 'yes'], default=True, help="Initialize 5Hz LM (default: True)")
|
| 135 |
parser.add_argument("--lm_model_path", type=str, default=None, help="5Hz LM model path (e.g., 'acestep-5Hz-lm-0.6B')")
|
|
|
|
| 129 |
# Service initialization arguments
|
| 130 |
parser.add_argument("--init_service", type=lambda x: x.lower() in ['true', '1', 'yes'], default=False, help="Initialize service on startup (default: False)")
|
| 131 |
parser.add_argument("--checkpoint", type=str, default=None, help="Checkpoint file path (optional, for display purposes)")
|
| 132 |
+
parser.add_argument("--config_path", type=str, default=None, help="Main model path (e.g., 'acestep-v15-turbo')")
|
| 133 |
parser.add_argument("--device", type=str, default="auto", choices=["auto", "cuda", "cpu"], help="Processing device (default: auto)")
|
| 134 |
parser.add_argument("--init_llm", type=lambda x: x.lower() in ['true', '1', 'yes'], default=True, help="Initialize 5Hz LM (default: True)")
|
| 135 |
parser.add_argument("--lm_model_path", type=str, default=None, help="5Hz LM model path (e.g., 'acestep-5Hz-lm-0.6B')")
|
acestep/api_server.py
CHANGED
|
@@ -608,7 +608,7 @@ def create_app() -> FastAPI:
|
|
| 608 |
app.state.handler3 = handler3
|
| 609 |
app.state._initialized2 = False
|
| 610 |
app.state._initialized3 = False
|
| 611 |
-
app.state._config_path = os.getenv("ACESTEP_CONFIG_PATH", "acestep-v15-turbo
|
| 612 |
app.state._config_path2 = config_path2
|
| 613 |
app.state._config_path3 = config_path3
|
| 614 |
|
|
@@ -661,7 +661,7 @@ def create_app() -> FastAPI:
|
|
| 661 |
raise RuntimeError(app.state._init_error)
|
| 662 |
|
| 663 |
project_root = _get_project_root()
|
| 664 |
-
config_path = os.getenv("ACESTEP_CONFIG_PATH", "acestep-v15-turbo
|
| 665 |
device = os.getenv("ACESTEP_DEVICE", "auto")
|
| 666 |
|
| 667 |
use_flash_attention = _env_bool("ACESTEP_USE_FLASH_ATTENTION", True)
|
|
@@ -868,7 +868,7 @@ def create_app() -> FastAPI:
|
|
| 868 |
|
| 869 |
project_root = _get_project_root()
|
| 870 |
checkpoint_dir = os.path.join(project_root, "checkpoints")
|
| 871 |
-
lm_model_path = (req.lm_model_path or os.getenv("ACESTEP_LM_MODEL_PATH") or "acestep-5Hz-lm-0.6B
|
| 872 |
backend = (req.lm_backend or os.getenv("ACESTEP_LM_BACKEND") or "vllm").strip().lower()
|
| 873 |
if backend not in {"vllm", "pt"}:
|
| 874 |
backend = "vllm"
|
|
@@ -1195,7 +1195,7 @@ def create_app() -> FastAPI:
|
|
| 1195 |
return s
|
| 1196 |
|
| 1197 |
# Get model information
|
| 1198 |
-
lm_model_name = os.getenv("ACESTEP_LM_MODEL_PATH", "acestep-5Hz-lm-0.6B
|
| 1199 |
# Use selected_model_name (set at the beginning of _run_one_job)
|
| 1200 |
dit_model_name = selected_model_name
|
| 1201 |
|
|
|
|
| 608 |
app.state.handler3 = handler3
|
| 609 |
app.state._initialized2 = False
|
| 610 |
app.state._initialized3 = False
|
| 611 |
+
app.state._config_path = os.getenv("ACESTEP_CONFIG_PATH", "acestep-v15-turbo")
|
| 612 |
app.state._config_path2 = config_path2
|
| 613 |
app.state._config_path3 = config_path3
|
| 614 |
|
|
|
|
| 661 |
raise RuntimeError(app.state._init_error)
|
| 662 |
|
| 663 |
project_root = _get_project_root()
|
| 664 |
+
config_path = os.getenv("ACESTEP_CONFIG_PATH", "acestep-v15-turbo")
|
| 665 |
device = os.getenv("ACESTEP_DEVICE", "auto")
|
| 666 |
|
| 667 |
use_flash_attention = _env_bool("ACESTEP_USE_FLASH_ATTENTION", True)
|
|
|
|
| 868 |
|
| 869 |
project_root = _get_project_root()
|
| 870 |
checkpoint_dir = os.path.join(project_root, "checkpoints")
|
| 871 |
+
lm_model_path = (req.lm_model_path or os.getenv("ACESTEP_LM_MODEL_PATH") or "acestep-5Hz-lm-0.6B").strip()
|
| 872 |
backend = (req.lm_backend or os.getenv("ACESTEP_LM_BACKEND") or "vllm").strip().lower()
|
| 873 |
if backend not in {"vllm", "pt"}:
|
| 874 |
backend = "vllm"
|
|
|
|
| 1195 |
return s
|
| 1196 |
|
| 1197 |
# Get model information
|
| 1198 |
+
lm_model_name = os.getenv("ACESTEP_LM_MODEL_PATH", "acestep-5Hz-lm-0.6B")
|
| 1199 |
# Use selected_model_name (set at the beginning of _run_one_job)
|
| 1200 |
dit_model_name = selected_model_name
|
| 1201 |
|
assets/model_zoo.png
ADDED
|
Git LFS Details
|
generate_examples.py
CHANGED
|
@@ -39,8 +39,8 @@ def generate_examples(num_examples=50, output_dir="examples/text2music", start_i
|
|
| 39 |
logger.error("No 5Hz LM models found in checkpoints directory")
|
| 40 |
return
|
| 41 |
|
| 42 |
-
# Prefer acestep-5Hz-lm-0.6B
|
| 43 |
-
lm_model = "acestep-5Hz-lm-0.6B
|
| 44 |
logger.info(f"Using LM model: {lm_model}")
|
| 45 |
|
| 46 |
# Initialize LM
|
|
|
|
| 39 |
logger.error("No 5Hz LM models found in checkpoints directory")
|
| 40 |
return
|
| 41 |
|
| 42 |
+
# Prefer acestep-5Hz-lm-0.6B if available
|
| 43 |
+
lm_model = "acestep-5Hz-lm-0.6B" if "acestep-5Hz-lm-0.6B" in available_models else available_models[0]
|
| 44 |
logger.info(f"Using LM model: {lm_model}")
|
| 45 |
|
| 46 |
# Initialize LM
|
profile_inference.py
CHANGED
|
@@ -40,8 +40,8 @@ if project_root not in sys.path:
|
|
| 40 |
def load_env_config():
|
| 41 |
"""从 .env 文件加载配置"""
|
| 42 |
env_config = {
|
| 43 |
-
'ACESTEP_CONFIG_PATH': 'acestep-v15-turbo
|
| 44 |
-
'ACESTEP_LM_MODEL_PATH': 'acestep-5Hz-lm-0.6B
|
| 45 |
'ACESTEP_DEVICE': 'auto',
|
| 46 |
'ACESTEP_LM_BACKEND': 'vllm',
|
| 47 |
}
|
|
|
|
| 40 |
def load_env_config():
|
| 41 |
"""从 .env 文件加载配置"""
|
| 42 |
env_config = {
|
| 43 |
+
'ACESTEP_CONFIG_PATH': 'acestep-v15-turbo',
|
| 44 |
+
'ACESTEP_LM_MODEL_PATH': 'acestep-5Hz-lm-0.6B',
|
| 45 |
'ACESTEP_DEVICE': 'auto',
|
| 46 |
'ACESTEP_LM_BACKEND': 'vllm',
|
| 47 |
}
|