Spaces:

ACE-Step
/

Ace-Step-v1.5

Running on A100

App Files Files Community

Gong Junmin commited on 23 days ago

Commit

1d01ac3

unverified ·

2 Parent(s): 2d762cb 5c9bcfb

Merge pull request #17 from ace-step/add_model_zoo

Browse files

Files changed (6) hide show

README.md +24 -1
acestep/acestep_v15_pipeline.py +1 -1
acestep/api_server.py +4 -4
assets/model_zoo.png +3 -0
generate_examples.py +2 -2
profile_inference.py +2 -2

README.md CHANGED Viewed

@@ -19,6 +19,8 @@
 - [📦 Installation](#-installation)
 - [🚀 Usage](#-usage)
 - [🔨 Train](#-train)
 ## 📝 Abstract
 We present ACE-Step v1.5, a highly efficient foundation model that democratizes commercial-grade music production on consumer hardware. Optimized for local deployment (<4GB VRAM), the model accelerates generation by over 100× compared to traditional pure LM architectures, producing superior high-fidelity audio in seconds characterized by coherent semantics and exceptional melodies. At its core lies a novel hybrid architecture where the Language Model (LM) functions as an omni-capable planner: it transforms simple user queries into comprehensive song blueprints—scaling from short loops to 10-minute compositions—while synthesizing metadata, lyrics, and captions via Chain-of-Thought to guide the Diffusion Transformer (DiT). Uniquely, this alignment is achieved through intrinsic reinforcement learning relying solely on the model’s internal mechanisms, thereby eliminating the biases inherent in external reward models or human preferences. Beyond standard synthesis, ACE-Step v1.5 unifies precise stylistic control with versatile editing capabilities—such as cover generation, repainting, and vocal-to-BGM conversion—while maintaining strict adherence to prompts across 50+ languages.
@@ -31,7 +33,7 @@ We present ACE-Step v1.5, a highly efficient foundation model that democratizes
 </p>
 ### ⚡ Performance
-- ✅ **Ultra-Fast Generation** — 0.5s to 10s generation time (depending on think mode & diffusion steps)
 - ✅ **Flexible Duration** — Supports 10 seconds to 10 minutes (600s) audio generation
 - ✅ **Batch Generation** — Generate up to 8 songs simultaneously
@@ -159,7 +161,28 @@ See the **LoRA Training** tab in Gradio UI for one-click training, or check [Gra
     <img src="./assets/ACE-Step_framework.png" width="100%" alt="ACE-Step Framework">
 </p>
 ## 📜 License & Disclaimer

 - [📦 Installation](#-installation)
 - [🚀 Usage](#-usage)
 - [🔨 Train](#-train)
+- [🏗️ Architecture](#️-architecture)
+- [🦁 Model Zoo](#-model-zoo)
 ## 📝 Abstract
 We present ACE-Step v1.5, a highly efficient foundation model that democratizes commercial-grade music production on consumer hardware. Optimized for local deployment (<4GB VRAM), the model accelerates generation by over 100× compared to traditional pure LM architectures, producing superior high-fidelity audio in seconds characterized by coherent semantics and exceptional melodies. At its core lies a novel hybrid architecture where the Language Model (LM) functions as an omni-capable planner: it transforms simple user queries into comprehensive song blueprints—scaling from short loops to 10-minute compositions—while synthesizing metadata, lyrics, and captions via Chain-of-Thought to guide the Diffusion Transformer (DiT). Uniquely, this alignment is achieved through intrinsic reinforcement learning relying solely on the model’s internal mechanisms, thereby eliminating the biases inherent in external reward models or human preferences. Beyond standard synthesis, ACE-Step v1.5 unifies precise stylistic control with versatile editing capabilities—such as cover generation, repainting, and vocal-to-BGM conversion—while maintaining strict adherence to prompts across 50+ languages.
 </p>
 ### ⚡ Performance
+- ✅ **Ultra-Fast Generation** — 0.5s to 10s generation time on A100 (depending on think mode & diffusion steps)
 - ✅ **Flexible Duration** — Supports 10 seconds to 10 minutes (600s) audio generation
 - ✅ **Batch Generation** — Generate up to 8 songs simultaneously
     <img src="./assets/ACE-Step_framework.png" width="100%" alt="ACE-Step Framework">
 </p>
+## 🦁 Model Zoo
+<p align="center">
+    <img src="./assets/model_zoo.png" width="100%" alt="Model Zoo">
+</p>
+### DiT Models
+| DiT Model | Pre-Training | SFT | RL | CFG | Step | Refer audio | Text2Music | Cover | Repaint | Extract | Lego | Complete | Quality | Diversity | Fine-Tunability | Hugging Face |
+|-----------|:------------:|:---:|:--:|:---:|:----:|:-----------:|:----------:|:-----:|:-------:|:-------:|:----:|:--------:|:-------:|:---------:|:---------------:|--------------|
+| `acestep-v15-base` | ✅ | ❌ | ❌ | ✅ | 50 | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | Medium | High | Easy | [Link](https://huggingface.co/ACE-Step/acestep-v15-base) |
+| `acestep-v15-sft` | ✅ | ✅ | ❌ | ✅ | 50 | ✅ | ✅ | ✅ | ✅ | ❌ | ❌ | ❌ | High | Medium | Easy | [Link](https://huggingface.co/ACE-Step/acestep-v15-sft) |
+| `acestep-v15-turbo` | ✅ | ✅ | ❌ | ❌ | 8 | ✅ | ✅ | ✅ | ✅ | ❌ | ❌ | ❌ | Very High | Medium | Medium | [Link](https://huggingface.co/ACE-Step/Ace-Step1.5) |
+| `acestep-v15-turbo-rl` | ✅ | ✅ | ✅ | ❌ | 8 | ✅ | ✅ | ✅ | ✅ | ❌ | ❌ | ❌ | Very High | Medium | Medium | To be released |
+### LM Models
+| LM Model | Pretrain from | Pre-Training | SFT | RL | CoT metas | Query rewrite | Audio Understanding | Composition Capability | Copy Melody | Hugging Face |
+|----------|---------------|:------------:|:---:|:--:|:---------:|:-------------:|:-------------------:|:----------------------:|:-----------:|--------------|
+| `acestep-5Hz-lm-0.6B` | Qwen3-0.6B | ✅ | ✅ | ✅ | ✅ | ✅ | Medium | Medium | Weak | ✅ |
+| `acestep-5Hz-lm-1.7B` | Qwen3-1.7B | ✅ | ✅ | ✅ | ✅ | ✅ | Medium | Medium | Medium | ✅ |
+| `acestep-5Hz-lm-4B` | Qwen3-4B | ✅ | ✅ | ✅ | ✅ | ✅ | Strong | Strong | Strong | To be released |
 ## 📜 License & Disclaimer

acestep/acestep_v15_pipeline.py CHANGED Viewed

@@ -129,7 +129,7 @@ def main():
     # Service initialization arguments
     parser.add_argument("--init_service", type=lambda x: x.lower() in ['true', '1', 'yes'], default=False, help="Initialize service on startup (default: False)")
     parser.add_argument("--checkpoint", type=str, default=None, help="Checkpoint file path (optional, for display purposes)")
-    parser.add_argument("--config_path", type=str, default=None, help="Main model path (e.g., 'acestep-v15-turbo-rl')")
     parser.add_argument("--device", type=str, default="auto", choices=["auto", "cuda", "cpu"], help="Processing device (default: auto)")
     parser.add_argument("--init_llm", type=lambda x: x.lower() in ['true', '1', 'yes'], default=True, help="Initialize 5Hz LM (default: True)")
     parser.add_argument("--lm_model_path", type=str, default=None, help="5Hz LM model path (e.g., 'acestep-5Hz-lm-0.6B')")

     # Service initialization arguments
     parser.add_argument("--init_service", type=lambda x: x.lower() in ['true', '1', 'yes'], default=False, help="Initialize service on startup (default: False)")
     parser.add_argument("--checkpoint", type=str, default=None, help="Checkpoint file path (optional, for display purposes)")
+    parser.add_argument("--config_path", type=str, default=None, help="Main model path (e.g., 'acestep-v15-turbo')")
     parser.add_argument("--device", type=str, default="auto", choices=["auto", "cuda", "cpu"], help="Processing device (default: auto)")
     parser.add_argument("--init_llm", type=lambda x: x.lower() in ['true', '1', 'yes'], default=True, help="Initialize 5Hz LM (default: True)")
     parser.add_argument("--lm_model_path", type=str, default=None, help="5Hz LM model path (e.g., 'acestep-5Hz-lm-0.6B')")

acestep/api_server.py CHANGED Viewed

@@ -608,7 +608,7 @@ def create_app() -> FastAPI:
         app.state.handler3 = handler3
         app.state._initialized2 = False
         app.state._initialized3 = False
-        app.state._config_path = os.getenv("ACESTEP_CONFIG_PATH", "acestep-v15-turbo-rl")
         app.state._config_path2 = config_path2
         app.state._config_path3 = config_path3
@@ -661,7 +661,7 @@ def create_app() -> FastAPI:
                     raise RuntimeError(app.state._init_error)
                 project_root = _get_project_root()
-                config_path = os.getenv("ACESTEP_CONFIG_PATH", "acestep-v15-turbo-rl")
                 device = os.getenv("ACESTEP_DEVICE", "auto")
                 use_flash_attention = _env_bool("ACESTEP_USE_FLASH_ATTENTION", True)
@@ -868,7 +868,7 @@ def create_app() -> FastAPI:
                         project_root = _get_project_root()
                         checkpoint_dir = os.path.join(project_root, "checkpoints")
-                        lm_model_path = (req.lm_model_path or os.getenv("ACESTEP_LM_MODEL_PATH") or "acestep-5Hz-lm-0.6B-v3").strip()
                         backend = (req.lm_backend or os.getenv("ACESTEP_LM_BACKEND") or "vllm").strip().lower()
                         if backend not in {"vllm", "pt"}:
                             backend = "vllm"
@@ -1195,7 +1195,7 @@ def create_app() -> FastAPI:
                     return s
                 # Get model information
-                lm_model_name = os.getenv("ACESTEP_LM_MODEL_PATH", "acestep-5Hz-lm-0.6B-v3")
                 # Use selected_model_name (set at the beginning of _run_one_job)
                 dit_model_name = selected_model_name

         app.state.handler3 = handler3
         app.state._initialized2 = False
         app.state._initialized3 = False
+        app.state._config_path = os.getenv("ACESTEP_CONFIG_PATH", "acestep-v15-turbo")
         app.state._config_path2 = config_path2
         app.state._config_path3 = config_path3
                     raise RuntimeError(app.state._init_error)
                 project_root = _get_project_root()
+                config_path = os.getenv("ACESTEP_CONFIG_PATH", "acestep-v15-turbo")
                 device = os.getenv("ACESTEP_DEVICE", "auto")
                 use_flash_attention = _env_bool("ACESTEP_USE_FLASH_ATTENTION", True)
                         project_root = _get_project_root()
                         checkpoint_dir = os.path.join(project_root, "checkpoints")
+                        lm_model_path = (req.lm_model_path or os.getenv("ACESTEP_LM_MODEL_PATH") or "acestep-5Hz-lm-0.6B").strip()
                         backend = (req.lm_backend or os.getenv("ACESTEP_LM_BACKEND") or "vllm").strip().lower()
                         if backend not in {"vllm", "pt"}:
                             backend = "vllm"
                     return s
                 # Get model information
+                lm_model_name = os.getenv("ACESTEP_LM_MODEL_PATH", "acestep-5Hz-lm-0.6B")
                 # Use selected_model_name (set at the beginning of _run_one_job)
                 dit_model_name = selected_model_name

assets/model_zoo.png ADDED Viewed

Git LFS Details

SHA256: a1c5bf28c11cf9983b52257bbbb9d05cadbba633dfa3687f2459016acf876e35
Pointer size: 131 Bytes
Size of remote file: 347 kB

generate_examples.py CHANGED Viewed

@@ -39,8 +39,8 @@ def generate_examples(num_examples=50, output_dir="examples/text2music", start_i
         logger.error("No 5Hz LM models found in checkpoints directory")
         return
-    # Prefer acestep-5Hz-lm-0.6B-v3 if available
-    lm_model = "acestep-5Hz-lm-0.6B-v3" if "acestep-5Hz-lm-0.6B-v3" in available_models else available_models[0]
     logger.info(f"Using LM model: {lm_model}")
     # Initialize LM

         logger.error("No 5Hz LM models found in checkpoints directory")
         return
+    # Prefer acestep-5Hz-lm-0.6B if available
+    lm_model = "acestep-5Hz-lm-0.6B" if "acestep-5Hz-lm-0.6B" in available_models else available_models[0]
     logger.info(f"Using LM model: {lm_model}")
     # Initialize LM

profile_inference.py CHANGED Viewed

@@ -40,8 +40,8 @@ if project_root not in sys.path:
 def load_env_config():
     """从 .env 文件加载配置"""
     env_config = {
-        'ACESTEP_CONFIG_PATH': 'acestep-v15-turbo-rl',
-        'ACESTEP_LM_MODEL_PATH': 'acestep-5Hz-lm-0.6B-v3',
         'ACESTEP_DEVICE': 'auto',
         'ACESTEP_LM_BACKEND': 'vllm',
     }

 def load_env_config():
     """从 .env 文件加载配置"""
     env_config = {
+        'ACESTEP_CONFIG_PATH': 'acestep-v15-turbo',
+        'ACESTEP_LM_MODEL_PATH': 'acestep-5Hz-lm-0.6B',
         'ACESTEP_DEVICE': 'auto',
         'ACESTEP_LM_BACKEND': 'vllm',
     }