File size: 8,915 Bytes
0228d48
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1ff710f
0228d48
 
 
1ff710f
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
0228d48
 
 
 
 
 
 
 
 
1ff710f
 
 
 
 
 
0228d48
 
 
 
 
 
 
 
 
 
1ff710f
c0934b3
1ff710f
c0934b3
 
 
 
 
1ff710f
 
c0934b3
 
1ff710f
c0934b3
 
0228d48
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1ff710f
c0934b3
 
 
 
 
 
 
1ff710f
 
 
 
 
c0934b3
 
 
 
1ff710f
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
c0934b3
 
 
 
 
 
1ff710f
c0934b3
0228d48
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
# ACE-Step API Client Documentation

This service provides an HTTP-based asynchronous music generation API.

**Basic Workflow**:
1. Call `POST /v1/music/generate` to submit a task and obtain a `job_id`.
2. Call `GET /v1/jobs/{job_id}` to poll the task status until `status` is `succeeded` or `failed`.

---

## 1. Task Status Description

Task status (`status`) includes the following types:

- `queued`: Task has entered the queue and is waiting to be executed. You can check `queue_position` and `eta_seconds` at this time.
- `running`: Generation is in progress.
- `succeeded`: Generation succeeded, results are in the `result` field.
- `failed`: Generation failed, error information is in the `error` field.

---

## 2. Create Generation Task

### 2.1 API Definition

- **URL**: `/v1/music/generate`
- **Method**: `POST`
- **Content-Type**: `application/json` or `multipart/form-data`

### 2.2 Request Parameters

#### Method A: JSON Request (application/json)

Suitable for passing only text parameters, or referencing audio file paths that already exist on the server.

**Basic Parameters**:

| Parameter Name | Type | Default | Description |
| :--- | :--- | :--- | :--- |
| `caption` | string | `""` | Music description prompt |
| `lyrics` | string | `""` | Lyrics content |
| `thinking` | bool | `false` | Whether to use 5Hz LM to generate audio codes (lm-dit behavior). |
| `vocal_language` | string | `"en"` | Lyrics language (en, zh, ja, etc.) |
| `audio_format` | string | `"mp3"` | Output format (mp3, wav, flac) |

**thinking Semantics (Important)**:

- `thinking=false`:
  - The server will **NOT** use 5Hz LM to generate `audio_code_string`.
  - DiT runs in **text2music** mode and **ignores** any provided `audio_code_string`.
- `thinking=true`:
  - The server will use 5Hz LM to generate `audio_code_string` (lm-dit behavior).
  - DiT runs in **cover** mode and uses `audio_code_string`.

**Metadata Auto-Completion (Always On)**:

Regardless of `thinking`, if any of the following fields are missing, the server may call 5Hz LM to **fill only the missing fields** based on `caption`/`lyrics`:

- `bpm`
- `key_scale`
- `time_signature`
- `audio_duration`

User-provided values always win; LM only fills the fields that are empty/missing.

**Music Attribute Parameters**:

| Parameter Name | Type | Default | Description |
| :--- | :--- | :--- | :--- |
| `bpm` | int | null | Specify tempo (BPM) |
| `key_scale` | string | `""` | Key/scale (e.g., "C Major") |
| `time_signature` | string | `""` | Time signature (e.g., "4/4") |
| `audio_duration` | float | null | Generation duration (seconds) |

**Audio Codes (Optional)**:

| Parameter Name | Type | Default | Description |
| :--- | :--- | :--- | :--- |
| `audio_code_string` | string or string[] | `""` | Audio semantic tokens (5Hz) for `llm_dit`. If provided as an array, it should match `batch_size` (or the server batch size). |

**Generation Control Parameters**:

| Parameter Name | Type | Default | Description |
| :--- | :--- | :--- | :--- |
| `inference_steps` | int | `8` | Number of inference steps |
| `guidance_scale` | float | `7.0` | Prompt guidance coefficient |
| `use_random_seed` | bool | `true` | Whether to use random seed |
| `seed` | int | `-1` | Specify seed (when use_random_seed=false) |
| `batch_size` | int | null | Batch generation count |

**5Hz LM Parameters (Optional, server-side)**:

These parameters control 5Hz LM sampling, used for metadata auto-completion and (when `thinking=true`) codes generation.

| Parameter Name | Type | Default | Description |
| :--- | :--- | :--- | :--- |
| `lm_model_path` | string | null | 5Hz LM checkpoint dir name (e.g. `acestep-5Hz-lm-0.6B`) |
| `lm_backend` | string | `"vllm"` | `vllm` or `pt` |
| `lm_temperature` | float | `0.85` | Sampling temperature |
| `lm_cfg_scale` | float | `2.0` | CFG scale (>1 enables CFG) |
| `lm_negative_prompt` | string | `"NO USER INPUT"` | Negative prompt used by CFG |
| `lm_top_k` | int | null | Top-k (0/null disables) |
| `lm_top_p` | float | `0.9` | Top-p (>=1 will be treated as disabled) |
| `lm_repetition_penalty` | float | `1.0` | Repetition penalty |

**Edit/Reference Audio Parameters** (requires absolute path on server):

| Parameter Name | Type | Default | Description |
| :--- | :--- | :--- | :--- |
| `reference_audio_path` | string | null | Reference audio path (Style Transfer) |
| `src_audio_path` | string | null | Source audio path (Repainting/Cover) |
| `task_type` | string | `"text2music"` | Task type (text2music, cover, repaint) |
| `instruction` | string | `"Fill..."` | Edit instruction |
| `repainting_start` | float | `0.0` | Repainting start time |
| `repainting_end` | float | null | Repainting end time |
| `audio_cover_strength` | float | `1.0` | Cover strength |

#### Method B: File Upload (multipart/form-data)

Use this when you need to upload local audio files as reference or source audio.

In addition to supporting all the above fields as Form Fields, the following file fields are also supported:

- `reference_audio`: (File) Upload reference audio file
- `src_audio`: (File) Upload source audio file

> **Note**: After uploading files, the corresponding `_path` parameters will be automatically ignored, and the system will use the temporary file path after upload.

### 2.3 Response Example

```json
{
  "job_id": "550e8400-e29b-41d4-a716-446655440000",
  "status": "queued",
  "queue_position": 1
}
```

### 2.4 Usage Examples (cURL)

**JSON Method**:

```bash
curl -X POST http://localhost:8001/v1/music/generate \
  -H 'Content-Type: application/json' \
  -d '{
    "caption": "upbeat pop song",
    "lyrics": "Hello world",
    "inference_steps": 16
  }'
```

**JSON Method (thinking=true: generate codes + fill missing metas)**:

```bash
curl -X POST http://localhost:8001/v1/music/generate \
  -H 'Content-Type: application/json' \
  -d '{
    "caption": "upbeat pop song",
    "lyrics": "Hello world",
    "thinking": true,
    "lm_temperature": 0.85,
    "lm_cfg_scale": 2.0,
    "lm_top_k": null,
    "lm_top_p": 0.9,
    "lm_repetition_penalty": 1.0
  }'
```

**JSON Method (thinking=false: do NOT generate codes, but fill missing metas)**:

Example: user specifies `bpm` but omits `audio_duration`. The server may call LM to infer `duration` from `caption`/`lyrics` and use it only if the user did not set it.

```bash
curl -X POST http://localhost:8001/v1/music/generate \
  -H 'Content-Type: application/json' \
  -d '{
    "caption": "slow emotional ballad",
    "lyrics": "...",
    "thinking": false,
    "bpm": 72
  }'
```

When the server invokes the 5Hz LM (to fill metas and/or generate codes), the job `result` may include the following optional fields:

- `bpm`
- `duration`
- `genres`
- `keyscale`
- `timesignature`
- `metas` (raw-ish metadata dict)

> Note: If you use `curl -d` but **forget** to add `-H 'Content-Type: application/json'`, curl will default to sending `application/x-www-form-urlencoded`, and older server versions will return 415.

**Form Method (no file upload, application/x-www-form-urlencoded)**:

```bash
curl -X POST http://localhost:8001/v1/music/generate \
  -H 'Content-Type: application/x-www-form-urlencoded' \
  --data-urlencode 'caption=upbeat pop song' \
  --data-urlencode 'lyrics=Hello world' \
  --data-urlencode 'inference_steps=16'
```

**File Upload Method**:

```bash
curl -X POST http://localhost:8001/v1/music/generate \
  -F "caption=remix this song" \
  -F "src_audio=@/path/to/local/song.mp3" \
  -F "task_type=repaint"
```

---

## 3. Query Task Results

### 3.1 API Definition

- **URL**: `/v1/jobs/{job_id}`
- **Method**: `GET`

### 3.2 Response Parameters

The response contains basic task information, queue status, and final results.

**Main Fields**:

- `status`: Current status
- `queue_position`: Current queue position (0 means running or completed)
- `eta_seconds`: Estimated remaining wait time (seconds)
- `result`: Result object when successful
  - `audio_paths`: List of generated audio file URLs/paths
  - `first_audio_path`: Preferred audio path
  - `generation_info`: Generation parameter details
  - `status_message`: Brief result description
- `error`: Error information when failed

### 3.3 Response Examples

**Queued**:

```json
{
  "job_id": "...",
  "status": "queued",
  "created_at": 1700000000.0,
  "queue_position": 5,
  "eta_seconds": 25.0,
  "result": null,
  "error": null
}
```

**Execution Successful**:

```json
{
  "job_id": "...",
  "status": "succeeded",
  "created_at": 1700000000.0,
  "finished_at": 1700000010.0,
  "queue_position": 0,
  "result": {
    "first_audio_path": "/tmp/generated_1.mp3",
    "second_audio_path": "/tmp/generated_2.mp3",
    "audio_paths": ["/tmp/generated_1.mp3", "/tmp/generated_2.mp3"],
    "generation_info": "Steps: 8, Scale: 7.0 ...",
    "status_message": "✅ Generation completed successfully!",
    "seed_value": "12345"
  },
  "error": null
}
```