Spaces:
Running
on
A100
Running
on
A100
File size: 10,412 Bytes
9359f63 f527e35 9359f63 a657594 9359f63 428436b 42e0725 428436b 0b990cd 428436b 745e016 428436b 745e016 428436b 745e016 428436b c41a5bf 428436b c41a5bf 428436b 745e016 428436b 745e016 428436b 745e016 428436b 745e016 428436b 745e016 428436b 745e016 428436b 745e016 428436b 745e016 428436b 745e016 428436b 745e016 428436b 745e016 428436b 745e016 428436b 745e016 428436b 745e016 428436b 745e016 428436b 745e016 428436b 745e016 428436b 745e016 428436b c41a5bf 428436b c41a5bf 5c9bcfb c41a5bf 428436b |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 |
---
title: ACE-Step v1.5
emoji: π΅
colorFrom: blue
colorTo: purple
sdk: docker
app_port: 7860
pinned: false
license: mit
short_description: Music Generation Foundation Model v1.5
---
<h1 align="center">ACE-Step 1.5</h1>
<h1 align="center">Pushing the Boundaries of Open-Source Music Generation</h1>
<p align="center">
<a href="https://ace-step.github.io/ace-step-v1.5.github.io/">Project</a> |
<a href="https://huggingface.co/collections/ACE-Step/ace-step-15">Hugging Face</a> |
<a href="https://modelscope.cn/models/ACE-Step/ACE-Step-v1-5">ModelScope</a> |
<a href="https://huggingface.co/spaces/ACE-Step/Ace-Step-v1.5">Space Demo</a> |
<a href="https://discord.gg/PeWDxrkdj7">Discord</a> |
<a href="https://arxiv.org/abs/2506.00045">Technical Report</a>
</p>
<p align="center">
<img src="./assets/orgnization_logos.png" width="100%" alt="StepFun Logo">
</p>
## Table of Contents
- [β¨ Features](#-features)
- [π¦ Installation](#-installation)
- [π Usage](#-usage)
- [π¨ Train](#-train)
- [ποΈ Architecture](#οΈ-architecture)
- [π¦ Model Zoo](#-model-zoo)
## π Abstract
We present ACE-Step v1.5, a highly efficient foundation model that democratizes commercial-grade music production on consumer hardware. Optimized for local deployment (<4GB VRAM), the model accelerates generation by over 100Γ compared to traditional pure LM architectures, producing superior high-fidelity audio in seconds characterized by coherent semantics and exceptional melodies. At its core lies a novel hybrid architecture where the Language Model (LM) functions as an omni-capable planner: it transforms simple user queries into comprehensive song blueprintsβscaling from short loops to 10-minute compositionsβwhile synthesizing metadata, lyrics, and captions via Chain-of-Thought to guide the Diffusion Transformer (DiT). Uniquely, this alignment is achieved through intrinsic reinforcement learning relying solely on the modelβs internal mechanisms, thereby eliminating the biases inherent in external reward models or human preferences. Beyond standard synthesis, ACE-Step v1.5 unifies precise stylistic control with versatile editing capabilitiesβsuch as cover generation, repainting, and vocal-to-BGM conversionβwhile maintaining strict adherence to prompts across 50+ languages.
## β¨ Features
<p align="center">
<img src="./assets/application_map.png" width="100%" alt="ACE-Step Framework">
</p>
### β‘ Performance
- β
**Ultra-Fast Generation** β 0.5s to 10s generation time on A100 (depending on think mode & diffusion steps)
- β
**Flexible Duration** β Supports 10 seconds to 10 minutes (600s) audio generation
- β
**Batch Generation** β Generate up to 8 songs simultaneously
### π΅ Generation Quality
- β
**Commercial-Grade Output** β Quality between Suno v4.5 and Suno v5
- β
**Rich Style Support** β 1000+ instruments and styles with fine-grained timbre description
- β
**Multi-Language Lyrics** β Supports 50+ languages with lyrics prompt for structure & style control
### ποΈ Versatility & Control
| Feature | Description |
|---------|-------------|
| β
Reference Audio Input | Use reference audio to guide generation style |
| β
Cover Generation | Create covers from existing audio |
| β
Repaint & Edit | Selective local audio editing and regeneration |
| β
Track Separation | Separate audio into individual stems |
| β
Multi-Track Generation | Add layers like Suno Studio's "Add Layer" feature |
| β
Vocal2BGM | Auto-generate accompaniment for vocal tracks |
| β
Metadata Control | Control duration, BPM, key/scale, time signature |
| β
Simple Mode | Generate full songs from simple descriptions |
| β
Query Rewriting | Auto LM expansion of tags and lyrics |
| β
Audio Understanding | Extract BPM, key/scale, time signature & caption from audio |
| β
LRC Generation | Auto-generate lyric timestamps for generated music |
| β
LoRA Training | One-click annotation & training in Gradio. 8 songs, 1 hour on 3090 (12GB VRAM) |
| β
Quality Scoring | Automatic quality assessment for generated audio |
## π¦ Installation
> **Requirements:** Python 3.11, CUDA GPU recommended (works on CPU/MPS but slower)
### 1. Install uv (Package Manager)
```bash
# macOS / Linux
curl -LsSf https://astral.sh/uv/install.sh | sh
# Windows (PowerShell)
powershell -ExecutionPolicy ByPass -c "irm https://astral.sh/uv/install.ps1 | iex"
```
### 2. Clone & Install
```bash
git clone https://github.com/ACE-Step/ACE-Step-1.5.git
cd ACE-Step-1.5
uv sync
```
### 3. Launch
#### π₯οΈ Gradio Web UI (Recommended)
```bash
uv run acestep
```
Open http://localhost:7860 in your browser. Models will be downloaded automatically on first run.
#### π REST API Server
```bash
uv run acestep-api
```
API runs at http://localhost:8001. See [API Documentation](./docs/en/API.md) for endpoints.
### Command Line Options
**Gradio UI (`acestep`):**
| Option | Default | Description |
|--------|---------|-------------|
| `--port` | 7860 | Server port |
| `--server-name` | 127.0.0.1 | Server address (use `0.0.0.0` for network access) |
| `--share` | false | Create public Gradio link |
| `--language` | en | UI language: `en`, `zh`, `ja` |
| `--init_service` | false | Auto-initialize models on startup |
| `--config_path` | auto | DiT model (e.g., `acestep-v15-turbo`, `acestep-v15-turbo-shift3`) |
| `--lm_model_path` | auto | LM model (e.g., `acestep-5Hz-lm-0.6B`, `acestep-5Hz-lm-1.7B`) |
| `--offload_to_cpu` | auto | CPU offload (auto-enabled if VRAM < 16GB) |
**Examples:**
```bash
# Public access with Chinese UI
uv run acestep --server-name 0.0.0.0 --share --language zh
# Pre-initialize models on startup
uv run acestep --init_service true --config_path acestep-v15-turbo
```
### Development
```bash
# Add dependencies
uv add package-name
uv add --dev package-name
# Update all dependencies
uv sync --upgrade
```
## π Usage
We provide multiple ways to use ACE-Step:
| Method | Description | Documentation |
|--------|-------------|---------------|
| π₯οΈ **Gradio Web UI** | Interactive web interface for music generation | [Gradio Guide](./docs/en/GRADIO_GUIDE.md) |
| π **Python API** | Programmatic access for integration | [Inference API](./docs/en/INFERENCE.md) |
| π **REST API** | HTTP-based async API for services | [REST API](./docs/en/API.md) |
**π Documentation available in:** [English](./docs/en/) | [δΈζ](./docs/zh/) | [ζ₯ζ¬θͺ](./docs/ja/)
## π¨ Train
See the **LoRA Training** tab in Gradio UI for one-click training, or check [Gradio Guide - LoRA Training](./docs/en/GRADIO_GUIDE.md#lora-training) for details.
## ποΈ Architecture
<p align="center">
<img src="./assets/ACE-Step_framework.png" width="100%" alt="ACE-Step Framework">
</p>
## π¦ Model Zoo
<p align="center">
<img src="./assets/model_zoo.png" width="100%" alt="Model Zoo">
</p>
### DiT Models
| DiT Model | Pre-Training | SFT | RL | CFG | Step | Refer audio | Text2Music | Cover | Repaint | Extract | Lego | Complete | Quality | Diversity | Fine-Tunability | Hugging Face |
|-----------|:------------:|:---:|:--:|:---:|:----:|:-----------:|:----------:|:-----:|:-------:|:-------:|:----:|:--------:|:-------:|:---------:|:---------------:|--------------|
| `acestep-v15-base` | β
| β | β | β
| 50 | β
| β
| β
| β
| β
| β
| β
| Medium | High | Easy | [Link](https://huggingface.co/ACE-Step/acestep-v15-base) |
| `acestep-v15-sft` | β
| β
| β | β
| 50 | β
| β
| β
| β
| β | β | β | High | Medium | Easy | [Link](https://huggingface.co/ACE-Step/acestep-v15-sft) |
| `acestep-v15-turbo` | β
| β
| β | β | 8 | β
| β
| β
| β
| β | β | β | Very High | Medium | Medium | [Link](https://huggingface.co/ACE-Step/Ace-Step1.5) |
| `acestep-v15-turbo-rl` | β
| β
| β
| β | 8 | β
| β
| β
| β
| β | β | β | Very High | Medium | Medium | To be released |
### LM Models
| LM Model | Pretrain from | Pre-Training | SFT | RL | CoT metas | Query rewrite | Audio Understanding | Composition Capability | Copy Melody | Hugging Face |
|----------|---------------|:------------:|:---:|:--:|:---------:|:-------------:|:-------------------:|:----------------------:|:-----------:|--------------|
| `acestep-5Hz-lm-0.6B` | Qwen3-0.6B | β
| β
| β
| β
| β
| Medium | Medium | Weak | β
|
| `acestep-5Hz-lm-1.7B` | Qwen3-1.7B | β
| β
| β
| β
| β
| Medium | Medium | Medium | β
|
| `acestep-5Hz-lm-4B` | Qwen3-4B | β
| β
| β
| β
| β
| Strong | Strong | Strong | To be released |
## π License & Disclaimer
This project is licensed under [MIT](./LICENSE)
ACE-Step enables original music generation across diverse genres, with applications in creative production, education, and entertainment. While designed to support positive and artistic use cases, we acknowledge potential risks such as unintentional copyright infringement due to stylistic similarity, inappropriate blending of cultural elements, and misuse for generating harmful content. To ensure responsible use, we encourage users to verify the originality of generated works, clearly disclose AI involvement, and obtain appropriate permissions when adapting protected styles or materials. By using ACE-Step, you agree to uphold these principles and respect artistic integrity, cultural diversity, and legal compliance. The authors are not responsible for any misuse of the model, including but not limited to copyright violations, cultural insensitivity, or the generation of harmful content.
π Important Notice
The only official website for the ACE-Step project is our GitHub Pages site.
We do not operate any other websites.
π« Fake domains include but are not limited to:
ac\*\*p.com, a\*\*p.org, a\*\*\*c.org
β οΈ Please be cautious. Do not visit, trust, or make payments on any of those sites.
## π Acknowledgements
This project is co-led by ACE Studio and StepFun.
## π Citation
If you find this project useful for your research, please consider citing:
```BibTeX
@misc{gong2026acestep,
title={ACE-Step 1.5: Pushing the Boundaries of Open-Source Music Generation},
author={Junmin Gong, Song Yulin, Wenxiao Zhao, Sen Wang, Shengyuan Xu, Jing Guo},
howpublished={\url{https://github.com/ace-step/ACE-Step-1.5}},
year={2026},
note={GitHub repository}
}
```
|