Spaces:
Running
on
A100
ACE-Step API Client Documentation
This service provides an HTTP-based asynchronous music generation API.
Basic Workflow:
- Call
POST /v1/music/generateto submit a task and obtain ajob_id. - Call
GET /v1/jobs/{job_id}to poll the task status untilstatusissucceededorfailed.
1. Task Status Description
Task status (status) includes the following types:
queued: Task has entered the queue and is waiting to be executed. You can checkqueue_positionandeta_secondsat this time.running: Generation is in progress.succeeded: Generation succeeded, results are in theresultfield.failed: Generation failed, error information is in theerrorfield.
2. Create Generation Task
2.1 API Definition
- URL:
/v1/music/generate - Method:
POST - Content-Type:
application/jsonormultipart/form-data
2.2 Request Parameters
Method A: JSON Request (application/json)
Suitable for passing only text parameters, or referencing audio file paths that already exist on the server.
Basic Parameters:
| Parameter Name | Type | Default | Description |
|---|---|---|---|
caption |
string | "" |
Music description prompt |
lyrics |
string | "" |
Lyrics content |
thinking |
bool | false |
Whether to use 5Hz LM to generate audio codes (lm-dit behavior). |
vocal_language |
string | "en" |
Lyrics language (en, zh, ja, etc.) |
audio_format |
string | "mp3" |
Output format (mp3, wav, flac) |
thinking Semantics (Important):
thinking=false:- The server will NOT use 5Hz LM to generate
audio_code_string. - DiT runs in text2music mode and ignores any provided
audio_code_string.
- The server will NOT use 5Hz LM to generate
thinking=true:- The server will use 5Hz LM to generate
audio_code_string(lm-dit behavior). - DiT runs in cover mode and uses
audio_code_string.
- The server will use 5Hz LM to generate
Metadata Auto-Completion (Always On):
Regardless of thinking, if any of the following fields are missing, the server may call 5Hz LM to fill only the missing fields based on caption/lyrics:
bpmkey_scaletime_signatureaudio_duration
User-provided values always win; LM only fills the fields that are empty/missing.
Music Attribute Parameters:
| Parameter Name | Type | Default | Description |
|---|---|---|---|
bpm |
int | null | Specify tempo (BPM) |
key_scale |
string | "" |
Key/scale (e.g., "C Major") |
time_signature |
string | "" |
Time signature (e.g., "4/4") |
audio_duration |
float | null | Generation duration (seconds) |
Audio Codes (Optional):
| Parameter Name | Type | Default | Description |
|---|---|---|---|
audio_code_string |
string or string[] | "" |
Audio semantic tokens (5Hz) for llm_dit. If provided as an array, it should match batch_size (or the server batch size). |
Generation Control Parameters:
| Parameter Name | Type | Default | Description |
|---|---|---|---|
inference_steps |
int | 8 |
Number of inference steps |
guidance_scale |
float | 7.0 |
Prompt guidance coefficient |
use_random_seed |
bool | true |
Whether to use random seed |
seed |
int | -1 |
Specify seed (when use_random_seed=false) |
batch_size |
int | null | Batch generation count |
5Hz LM Parameters (Optional, server-side):
These parameters control 5Hz LM sampling, used for metadata auto-completion and (when thinking=true) codes generation.
| Parameter Name | Type | Default | Description |
|---|---|---|---|
lm_model_path |
string | null | 5Hz LM checkpoint dir name (e.g. acestep-5Hz-lm-0.6B) |
lm_backend |
string | "vllm" |
vllm or pt |
lm_temperature |
float | 0.85 |
Sampling temperature |
lm_cfg_scale |
float | 2.0 |
CFG scale (>1 enables CFG) |
lm_negative_prompt |
string | "NO USER INPUT" |
Negative prompt used by CFG |
lm_top_k |
int | null | Top-k (0/null disables) |
lm_top_p |
float | 0.9 |
Top-p (>=1 will be treated as disabled) |
lm_repetition_penalty |
float | 1.0 |
Repetition penalty |
Edit/Reference Audio Parameters (requires absolute path on server):
| Parameter Name | Type | Default | Description |
|---|---|---|---|
reference_audio_path |
string | null | Reference audio path (Style Transfer) |
src_audio_path |
string | null | Source audio path (Repainting/Cover) |
task_type |
string | "text2music" |
Task type (text2music, cover, repaint) |
instruction |
string | "Fill..." |
Edit instruction |
repainting_start |
float | 0.0 |
Repainting start time |
repainting_end |
float | null | Repainting end time |
audio_cover_strength |
float | 1.0 |
Cover strength |
Method B: File Upload (multipart/form-data)
Use this when you need to upload local audio files as reference or source audio.
In addition to supporting all the above fields as Form Fields, the following file fields are also supported:
reference_audio: (File) Upload reference audio filesrc_audio: (File) Upload source audio file
Note: After uploading files, the corresponding
_pathparameters will be automatically ignored, and the system will use the temporary file path after upload.
2.3 Response Example
{
"job_id": "550e8400-e29b-41d4-a716-446655440000",
"status": "queued",
"queue_position": 1
}
2.4 Usage Examples (cURL)
JSON Method:
curl -X POST http://localhost:8001/v1/music/generate \
-H 'Content-Type: application/json' \
-d '{
"caption": "upbeat pop song",
"lyrics": "Hello world",
"inference_steps": 16
}'
JSON Method (thinking=true: generate codes + fill missing metas):
curl -X POST http://localhost:8001/v1/music/generate \
-H 'Content-Type: application/json' \
-d '{
"caption": "upbeat pop song",
"lyrics": "Hello world",
"thinking": true,
"lm_temperature": 0.85,
"lm_cfg_scale": 2.0,
"lm_top_k": null,
"lm_top_p": 0.9,
"lm_repetition_penalty": 1.0
}'
JSON Method (thinking=false: do NOT generate codes, but fill missing metas):
Example: user specifies bpm but omits audio_duration. The server may call LM to infer duration from caption/lyrics and use it only if the user did not set it.
curl -X POST http://localhost:8001/v1/music/generate \
-H 'Content-Type: application/json' \
-d '{
"caption": "slow emotional ballad",
"lyrics": "...",
"thinking": false,
"bpm": 72
}'
When the server invokes the 5Hz LM (to fill metas and/or generate codes), the job result may include the following optional fields:
bpmdurationgenreskeyscaletimesignaturemetas(raw-ish metadata dict)
Note: If you use
curl -dbut forget to add-H 'Content-Type: application/json', curl will default to sendingapplication/x-www-form-urlencoded, and older server versions will return 415.
Form Method (no file upload, application/x-www-form-urlencoded):
curl -X POST http://localhost:8001/v1/music/generate \
-H 'Content-Type: application/x-www-form-urlencoded' \
--data-urlencode 'caption=upbeat pop song' \
--data-urlencode 'lyrics=Hello world' \
--data-urlencode 'inference_steps=16'
File Upload Method:
curl -X POST http://localhost:8001/v1/music/generate \
-F "caption=remix this song" \
-F "src_audio=@/path/to/local/song.mp3" \
-F "task_type=repaint"
3. Query Task Results
3.1 API Definition
- URL:
/v1/jobs/{job_id} - Method:
GET
3.2 Response Parameters
The response contains basic task information, queue status, and final results.
Main Fields:
status: Current statusqueue_position: Current queue position (0 means running or completed)eta_seconds: Estimated remaining wait time (seconds)result: Result object when successfulaudio_paths: List of generated audio file URLs/pathsfirst_audio_path: Preferred audio pathgeneration_info: Generation parameter detailsstatus_message: Brief result description
error: Error information when failed
3.3 Response Examples
Queued:
{
"job_id": "...",
"status": "queued",
"created_at": 1700000000.0,
"queue_position": 5,
"eta_seconds": 25.0,
"result": null,
"error": null
}
Execution Successful:
{
"job_id": "...",
"status": "succeeded",
"created_at": 1700000000.0,
"finished_at": 1700000010.0,
"queue_position": 0,
"result": {
"first_audio_path": "/tmp/generated_1.mp3",
"second_audio_path": "/tmp/generated_2.mp3",
"audio_paths": ["/tmp/generated_1.mp3", "/tmp/generated_2.mp3"],
"generation_info": "Steps: 8, Scale: 7.0 ...",
"status_message": "✅ Generation completed successfully!",
"seed_value": "12345"
},
"error": null
}