Gong Junmin commited on
Commit
1d01ac3
·
unverified ·
2 Parent(s): 2d762cb 5c9bcfb

Merge pull request #17 from ace-step/add_model_zoo

Browse files
README.md CHANGED
@@ -19,6 +19,8 @@
19
  - [📦 Installation](#-installation)
20
  - [🚀 Usage](#-usage)
21
  - [🔨 Train](#-train)
 
 
22
 
23
  ## 📝 Abstract
24
  We present ACE-Step v1.5, a highly efficient foundation model that democratizes commercial-grade music production on consumer hardware. Optimized for local deployment (<4GB VRAM), the model accelerates generation by over 100× compared to traditional pure LM architectures, producing superior high-fidelity audio in seconds characterized by coherent semantics and exceptional melodies. At its core lies a novel hybrid architecture where the Language Model (LM) functions as an omni-capable planner: it transforms simple user queries into comprehensive song blueprints—scaling from short loops to 10-minute compositions—while synthesizing metadata, lyrics, and captions via Chain-of-Thought to guide the Diffusion Transformer (DiT). Uniquely, this alignment is achieved through intrinsic reinforcement learning relying solely on the model’s internal mechanisms, thereby eliminating the biases inherent in external reward models or human preferences. Beyond standard synthesis, ACE-Step v1.5 unifies precise stylistic control with versatile editing capabilities—such as cover generation, repainting, and vocal-to-BGM conversion—while maintaining strict adherence to prompts across 50+ languages.
@@ -31,7 +33,7 @@ We present ACE-Step v1.5, a highly efficient foundation model that democratizes
31
  </p>
32
 
33
  ### ⚡ Performance
34
- - ✅ **Ultra-Fast Generation** — 0.5s to 10s generation time (depending on think mode & diffusion steps)
35
  - ✅ **Flexible Duration** — Supports 10 seconds to 10 minutes (600s) audio generation
36
  - ✅ **Batch Generation** — Generate up to 8 songs simultaneously
37
 
@@ -159,7 +161,28 @@ See the **LoRA Training** tab in Gradio UI for one-click training, or check [Gra
159
  <img src="./assets/ACE-Step_framework.png" width="100%" alt="ACE-Step Framework">
160
  </p>
161
 
 
162
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
163
 
164
  ## 📜 License & Disclaimer
165
 
 
19
  - [📦 Installation](#-installation)
20
  - [🚀 Usage](#-usage)
21
  - [🔨 Train](#-train)
22
+ - [🏗️ Architecture](#️-architecture)
23
+ - [🦁 Model Zoo](#-model-zoo)
24
 
25
  ## 📝 Abstract
26
  We present ACE-Step v1.5, a highly efficient foundation model that democratizes commercial-grade music production on consumer hardware. Optimized for local deployment (<4GB VRAM), the model accelerates generation by over 100× compared to traditional pure LM architectures, producing superior high-fidelity audio in seconds characterized by coherent semantics and exceptional melodies. At its core lies a novel hybrid architecture where the Language Model (LM) functions as an omni-capable planner: it transforms simple user queries into comprehensive song blueprints—scaling from short loops to 10-minute compositions—while synthesizing metadata, lyrics, and captions via Chain-of-Thought to guide the Diffusion Transformer (DiT). Uniquely, this alignment is achieved through intrinsic reinforcement learning relying solely on the model’s internal mechanisms, thereby eliminating the biases inherent in external reward models or human preferences. Beyond standard synthesis, ACE-Step v1.5 unifies precise stylistic control with versatile editing capabilities—such as cover generation, repainting, and vocal-to-BGM conversion—while maintaining strict adherence to prompts across 50+ languages.
 
33
  </p>
34
 
35
  ### ⚡ Performance
36
+ - ✅ **Ultra-Fast Generation** — 0.5s to 10s generation time on A100 (depending on think mode & diffusion steps)
37
  - ✅ **Flexible Duration** — Supports 10 seconds to 10 minutes (600s) audio generation
38
  - ✅ **Batch Generation** — Generate up to 8 songs simultaneously
39
 
 
161
  <img src="./assets/ACE-Step_framework.png" width="100%" alt="ACE-Step Framework">
162
  </p>
163
 
164
+ ## 🦁 Model Zoo
165
 
166
+ <p align="center">
167
+ <img src="./assets/model_zoo.png" width="100%" alt="Model Zoo">
168
+ </p>
169
+
170
+ ### DiT Models
171
+
172
+ | DiT Model | Pre-Training | SFT | RL | CFG | Step | Refer audio | Text2Music | Cover | Repaint | Extract | Lego | Complete | Quality | Diversity | Fine-Tunability | Hugging Face |
173
+ |-----------|:------------:|:---:|:--:|:---:|:----:|:-----------:|:----------:|:-----:|:-------:|:-------:|:----:|:--------:|:-------:|:---------:|:---------------:|--------------|
174
+ | `acestep-v15-base` | ✅ | ❌ | ❌ | ✅ | 50 | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | Medium | High | Easy | [Link](https://huggingface.co/ACE-Step/acestep-v15-base) |
175
+ | `acestep-v15-sft` | ✅ | ✅ | ❌ | ✅ | 50 | ✅ | ✅ | ✅ | ✅ | ❌ | ❌ | ❌ | High | Medium | Easy | [Link](https://huggingface.co/ACE-Step/acestep-v15-sft) |
176
+ | `acestep-v15-turbo` | ✅ | ✅ | ❌ | ❌ | 8 | ✅ | ✅ | ✅ | ✅ | ❌ | ❌ | ❌ | Very High | Medium | Medium | [Link](https://huggingface.co/ACE-Step/Ace-Step1.5) |
177
+ | `acestep-v15-turbo-rl` | ✅ | ✅ | ✅ | ❌ | 8 | ✅ | ✅ | ✅ | ✅ | ❌ | ❌ | ❌ | Very High | Medium | Medium | To be released |
178
+
179
+ ### LM Models
180
+
181
+ | LM Model | Pretrain from | Pre-Training | SFT | RL | CoT metas | Query rewrite | Audio Understanding | Composition Capability | Copy Melody | Hugging Face |
182
+ |----------|---------------|:------------:|:---:|:--:|:---------:|:-------------:|:-------------------:|:----------------------:|:-----------:|--------------|
183
+ | `acestep-5Hz-lm-0.6B` | Qwen3-0.6B | ✅ | ✅ | ✅ | ✅ | ✅ | Medium | Medium | Weak | ✅ |
184
+ | `acestep-5Hz-lm-1.7B` | Qwen3-1.7B | ✅ | ✅ | ✅ | ✅ | ✅ | Medium | Medium | Medium | ✅ |
185
+ | `acestep-5Hz-lm-4B` | Qwen3-4B | ✅ | ✅ | ✅ | ✅ | ✅ | Strong | Strong | Strong | To be released |
186
 
187
  ## 📜 License & Disclaimer
188
 
acestep/acestep_v15_pipeline.py CHANGED
@@ -129,7 +129,7 @@ def main():
129
  # Service initialization arguments
130
  parser.add_argument("--init_service", type=lambda x: x.lower() in ['true', '1', 'yes'], default=False, help="Initialize service on startup (default: False)")
131
  parser.add_argument("--checkpoint", type=str, default=None, help="Checkpoint file path (optional, for display purposes)")
132
- parser.add_argument("--config_path", type=str, default=None, help="Main model path (e.g., 'acestep-v15-turbo-rl')")
133
  parser.add_argument("--device", type=str, default="auto", choices=["auto", "cuda", "cpu"], help="Processing device (default: auto)")
134
  parser.add_argument("--init_llm", type=lambda x: x.lower() in ['true', '1', 'yes'], default=True, help="Initialize 5Hz LM (default: True)")
135
  parser.add_argument("--lm_model_path", type=str, default=None, help="5Hz LM model path (e.g., 'acestep-5Hz-lm-0.6B')")
 
129
  # Service initialization arguments
130
  parser.add_argument("--init_service", type=lambda x: x.lower() in ['true', '1', 'yes'], default=False, help="Initialize service on startup (default: False)")
131
  parser.add_argument("--checkpoint", type=str, default=None, help="Checkpoint file path (optional, for display purposes)")
132
+ parser.add_argument("--config_path", type=str, default=None, help="Main model path (e.g., 'acestep-v15-turbo')")
133
  parser.add_argument("--device", type=str, default="auto", choices=["auto", "cuda", "cpu"], help="Processing device (default: auto)")
134
  parser.add_argument("--init_llm", type=lambda x: x.lower() in ['true', '1', 'yes'], default=True, help="Initialize 5Hz LM (default: True)")
135
  parser.add_argument("--lm_model_path", type=str, default=None, help="5Hz LM model path (e.g., 'acestep-5Hz-lm-0.6B')")
acestep/api_server.py CHANGED
@@ -608,7 +608,7 @@ def create_app() -> FastAPI:
608
  app.state.handler3 = handler3
609
  app.state._initialized2 = False
610
  app.state._initialized3 = False
611
- app.state._config_path = os.getenv("ACESTEP_CONFIG_PATH", "acestep-v15-turbo-rl")
612
  app.state._config_path2 = config_path2
613
  app.state._config_path3 = config_path3
614
 
@@ -661,7 +661,7 @@ def create_app() -> FastAPI:
661
  raise RuntimeError(app.state._init_error)
662
 
663
  project_root = _get_project_root()
664
- config_path = os.getenv("ACESTEP_CONFIG_PATH", "acestep-v15-turbo-rl")
665
  device = os.getenv("ACESTEP_DEVICE", "auto")
666
 
667
  use_flash_attention = _env_bool("ACESTEP_USE_FLASH_ATTENTION", True)
@@ -868,7 +868,7 @@ def create_app() -> FastAPI:
868
 
869
  project_root = _get_project_root()
870
  checkpoint_dir = os.path.join(project_root, "checkpoints")
871
- lm_model_path = (req.lm_model_path or os.getenv("ACESTEP_LM_MODEL_PATH") or "acestep-5Hz-lm-0.6B-v3").strip()
872
  backend = (req.lm_backend or os.getenv("ACESTEP_LM_BACKEND") or "vllm").strip().lower()
873
  if backend not in {"vllm", "pt"}:
874
  backend = "vllm"
@@ -1195,7 +1195,7 @@ def create_app() -> FastAPI:
1195
  return s
1196
 
1197
  # Get model information
1198
- lm_model_name = os.getenv("ACESTEP_LM_MODEL_PATH", "acestep-5Hz-lm-0.6B-v3")
1199
  # Use selected_model_name (set at the beginning of _run_one_job)
1200
  dit_model_name = selected_model_name
1201
 
 
608
  app.state.handler3 = handler3
609
  app.state._initialized2 = False
610
  app.state._initialized3 = False
611
+ app.state._config_path = os.getenv("ACESTEP_CONFIG_PATH", "acestep-v15-turbo")
612
  app.state._config_path2 = config_path2
613
  app.state._config_path3 = config_path3
614
 
 
661
  raise RuntimeError(app.state._init_error)
662
 
663
  project_root = _get_project_root()
664
+ config_path = os.getenv("ACESTEP_CONFIG_PATH", "acestep-v15-turbo")
665
  device = os.getenv("ACESTEP_DEVICE", "auto")
666
 
667
  use_flash_attention = _env_bool("ACESTEP_USE_FLASH_ATTENTION", True)
 
868
 
869
  project_root = _get_project_root()
870
  checkpoint_dir = os.path.join(project_root, "checkpoints")
871
+ lm_model_path = (req.lm_model_path or os.getenv("ACESTEP_LM_MODEL_PATH") or "acestep-5Hz-lm-0.6B").strip()
872
  backend = (req.lm_backend or os.getenv("ACESTEP_LM_BACKEND") or "vllm").strip().lower()
873
  if backend not in {"vllm", "pt"}:
874
  backend = "vllm"
 
1195
  return s
1196
 
1197
  # Get model information
1198
+ lm_model_name = os.getenv("ACESTEP_LM_MODEL_PATH", "acestep-5Hz-lm-0.6B")
1199
  # Use selected_model_name (set at the beginning of _run_one_job)
1200
  dit_model_name = selected_model_name
1201
 
assets/model_zoo.png ADDED

Git LFS Details

  • SHA256: a1c5bf28c11cf9983b52257bbbb9d05cadbba633dfa3687f2459016acf876e35
  • Pointer size: 131 Bytes
  • Size of remote file: 347 kB
generate_examples.py CHANGED
@@ -39,8 +39,8 @@ def generate_examples(num_examples=50, output_dir="examples/text2music", start_i
39
  logger.error("No 5Hz LM models found in checkpoints directory")
40
  return
41
 
42
- # Prefer acestep-5Hz-lm-0.6B-v3 if available
43
- lm_model = "acestep-5Hz-lm-0.6B-v3" if "acestep-5Hz-lm-0.6B-v3" in available_models else available_models[0]
44
  logger.info(f"Using LM model: {lm_model}")
45
 
46
  # Initialize LM
 
39
  logger.error("No 5Hz LM models found in checkpoints directory")
40
  return
41
 
42
+ # Prefer acestep-5Hz-lm-0.6B if available
43
+ lm_model = "acestep-5Hz-lm-0.6B" if "acestep-5Hz-lm-0.6B" in available_models else available_models[0]
44
  logger.info(f"Using LM model: {lm_model}")
45
 
46
  # Initialize LM
profile_inference.py CHANGED
@@ -40,8 +40,8 @@ if project_root not in sys.path:
40
  def load_env_config():
41
  """从 .env 文件加载配置"""
42
  env_config = {
43
- 'ACESTEP_CONFIG_PATH': 'acestep-v15-turbo-rl',
44
- 'ACESTEP_LM_MODEL_PATH': 'acestep-5Hz-lm-0.6B-v3',
45
  'ACESTEP_DEVICE': 'auto',
46
  'ACESTEP_LM_BACKEND': 'vllm',
47
  }
 
40
  def load_env_config():
41
  """从 .env 文件加载配置"""
42
  env_config = {
43
+ 'ACESTEP_CONFIG_PATH': 'acestep-v15-turbo',
44
+ 'ACESTEP_LM_MODEL_PATH': 'acestep-5Hz-lm-0.6B',
45
  'ACESTEP_DEVICE': 'auto',
46
  'ACESTEP_LM_BACKEND': 'vllm',
47
  }