ChuxiJ commited on
Commit
1ff710f
·
1 Parent(s): f41792a

feat: update thinking mode

Browse files
Files changed (2) hide show
  1. API.md +55 -13
  2. acestep/api_server.py +80 -92
API.md CHANGED
@@ -39,9 +39,30 @@ Suitable for passing only text parameters, or referencing audio file paths that
39
  | :--- | :--- | :--- | :--- |
40
  | `caption` | string | `""` | Music description prompt |
41
  | `lyrics` | string | `""` | Lyrics content |
 
42
  | `vocal_language` | string | `"en"` | Lyrics language (en, zh, ja, etc.) |
43
  | `audio_format` | string | `"mp3"` | Output format (mp3, wav, flac) |
44
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
45
  **Music Attribute Parameters**:
46
 
47
  | Parameter Name | Type | Default | Description |
@@ -51,6 +72,12 @@ Suitable for passing only text parameters, or referencing audio file paths that
51
  | `time_signature` | string | `""` | Time signature (e.g., "4/4") |
52
  | `audio_duration` | float | null | Generation duration (seconds) |
53
 
 
 
 
 
 
 
54
  **Generation Control Parameters**:
55
 
56
  | Parameter Name | Type | Default | Description |
@@ -61,20 +88,19 @@ Suitable for passing only text parameters, or referencing audio file paths that
61
  | `seed` | int | `-1` | Specify seed (when use_random_seed=false) |
62
  | `batch_size` | int | null | Batch generation count |
63
 
64
- **5Hz LM Parameters (Optional, server-side codes generation)**:
65
 
66
- If you want the server to generate `audio_code_string` using the 5Hz LM (equivalent to Gradio's **Generate LM Hints** button), set `use_5hz_lm=true`.
67
 
68
  | Parameter Name | Type | Default | Description |
69
  | :--- | :--- | :--- | :--- |
70
- | `use_5hz_lm` | bool | `false` | Enable server-side 5Hz LM code generation |
71
  | `lm_model_path` | string | null | 5Hz LM checkpoint dir name (e.g. `acestep-5Hz-lm-0.6B`) |
72
  | `lm_backend` | string | `"vllm"` | `vllm` or `pt` |
73
- | `lm_temperature` | float | `0.6` | Sampling temperature |
74
- | `lm_cfg_scale` | float | `1.0` | CFG scale (>1 enables CFG) |
75
  | `lm_negative_prompt` | string | `"NO USER INPUT"` | Negative prompt used by CFG |
76
  | `lm_top_k` | int | null | Top-k (0/null disables) |
77
- | `lm_top_p` | float | null | Top-p (>=1/null disables) |
78
  | `lm_repetition_penalty` | float | `1.0` | Repetition penalty |
79
 
80
  **Edit/Reference Audio Parameters** (requires absolute path on server):
@@ -124,7 +150,7 @@ curl -X POST http://localhost:8001/v1/music/generate \
124
  }'
125
  ```
126
 
127
- **JSON Method (server-side 5Hz LM)**:
128
 
129
  ```bash
130
  curl -X POST http://localhost:8001/v1/music/generate \
@@ -132,22 +158,38 @@ curl -X POST http://localhost:8001/v1/music/generate \
132
  -d '{
133
  "caption": "upbeat pop song",
134
  "lyrics": "Hello world",
135
- "use_5hz_lm": true,
136
- "lm_temperature": 0.6,
137
- "lm_cfg_scale": 1.0,
138
- "lm_top_k": 0,
139
- "lm_top_p": 1.0,
140
  "lm_repetition_penalty": 1.0
141
  }'
142
  ```
143
 
144
- When `use_5hz_lm=true` and the server generates LM codes, the job `result` will also include the following optional fields:
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
145
 
146
  - `bpm`
147
  - `duration`
148
  - `genres`
149
  - `keyscale`
150
  - `timesignature`
 
151
 
152
  > Note: If you use `curl -d` but **forget** to add `-H 'Content-Type: application/json'`, curl will default to sending `application/x-www-form-urlencoded`, and older server versions will return 415.
153
 
 
39
  | :--- | :--- | :--- | :--- |
40
  | `caption` | string | `""` | Music description prompt |
41
  | `lyrics` | string | `""` | Lyrics content |
42
+ | `thinking` | bool | `false` | Whether to use 5Hz LM to generate audio codes (lm-dit behavior). |
43
  | `vocal_language` | string | `"en"` | Lyrics language (en, zh, ja, etc.) |
44
  | `audio_format` | string | `"mp3"` | Output format (mp3, wav, flac) |
45
 
46
+ **thinking Semantics (Important)**:
47
+
48
+ - `thinking=false`:
49
+ - The server will **NOT** use 5Hz LM to generate `audio_code_string`.
50
+ - DiT runs in **text2music** mode and **ignores** any provided `audio_code_string`.
51
+ - `thinking=true`:
52
+ - The server will use 5Hz LM to generate `audio_code_string` (lm-dit behavior).
53
+ - DiT runs in **cover** mode and uses `audio_code_string`.
54
+
55
+ **Metadata Auto-Completion (Always On)**:
56
+
57
+ Regardless of `thinking`, if any of the following fields are missing, the server may call 5Hz LM to **fill only the missing fields** based on `caption`/`lyrics`:
58
+
59
+ - `bpm`
60
+ - `key_scale`
61
+ - `time_signature`
62
+ - `audio_duration`
63
+
64
+ User-provided values always win; LM only fills the fields that are empty/missing.
65
+
66
  **Music Attribute Parameters**:
67
 
68
  | Parameter Name | Type | Default | Description |
 
72
  | `time_signature` | string | `""` | Time signature (e.g., "4/4") |
73
  | `audio_duration` | float | null | Generation duration (seconds) |
74
 
75
+ **Audio Codes (Optional)**:
76
+
77
+ | Parameter Name | Type | Default | Description |
78
+ | :--- | :--- | :--- | :--- |
79
+ | `audio_code_string` | string or string[] | `""` | Audio semantic tokens (5Hz) for `llm_dit`. If provided as an array, it should match `batch_size` (or the server batch size). |
80
+
81
  **Generation Control Parameters**:
82
 
83
  | Parameter Name | Type | Default | Description |
 
88
  | `seed` | int | `-1` | Specify seed (when use_random_seed=false) |
89
  | `batch_size` | int | null | Batch generation count |
90
 
91
+ **5Hz LM Parameters (Optional, server-side)**:
92
 
93
+ These parameters control 5Hz LM sampling, used for metadata auto-completion and (when `thinking=true`) codes generation.
94
 
95
  | Parameter Name | Type | Default | Description |
96
  | :--- | :--- | :--- | :--- |
 
97
  | `lm_model_path` | string | null | 5Hz LM checkpoint dir name (e.g. `acestep-5Hz-lm-0.6B`) |
98
  | `lm_backend` | string | `"vllm"` | `vllm` or `pt` |
99
+ | `lm_temperature` | float | `0.85` | Sampling temperature |
100
+ | `lm_cfg_scale` | float | `2.0` | CFG scale (>1 enables CFG) |
101
  | `lm_negative_prompt` | string | `"NO USER INPUT"` | Negative prompt used by CFG |
102
  | `lm_top_k` | int | null | Top-k (0/null disables) |
103
+ | `lm_top_p` | float | `0.9` | Top-p (>=1 will be treated as disabled) |
104
  | `lm_repetition_penalty` | float | `1.0` | Repetition penalty |
105
 
106
  **Edit/Reference Audio Parameters** (requires absolute path on server):
 
150
  }'
151
  ```
152
 
153
+ **JSON Method (thinking=true: generate codes + fill missing metas)**:
154
 
155
  ```bash
156
  curl -X POST http://localhost:8001/v1/music/generate \
 
158
  -d '{
159
  "caption": "upbeat pop song",
160
  "lyrics": "Hello world",
161
+ "thinking": true,
162
+ "lm_temperature": 0.85,
163
+ "lm_cfg_scale": 2.0,
164
+ "lm_top_k": null,
165
+ "lm_top_p": 0.9,
166
  "lm_repetition_penalty": 1.0
167
  }'
168
  ```
169
 
170
+ **JSON Method (thinking=false: do NOT generate codes, but fill missing metas)**:
171
+
172
+ Example: user specifies `bpm` but omits `audio_duration`. The server may call LM to infer `duration` from `caption`/`lyrics` and use it only if the user did not set it.
173
+
174
+ ```bash
175
+ curl -X POST http://localhost:8001/v1/music/generate \
176
+ -H 'Content-Type: application/json' \
177
+ -d '{
178
+ "caption": "slow emotional ballad",
179
+ "lyrics": "...",
180
+ "thinking": false,
181
+ "bpm": 72
182
+ }'
183
+ ```
184
+
185
+ When the server invokes the 5Hz LM (to fill metas and/or generate codes), the job `result` may include the following optional fields:
186
 
187
  - `bpm`
188
  - `duration`
189
  - `genres`
190
  - `keyscale`
191
  - `timesignature`
192
+ - `metas` (raw-ish metadata dict)
193
 
194
  > Note: If you use `curl -d` but **forget** to add `-H 'Content-Type: application/json'`, curl will default to sending `application/x-www-form-urlencoded`, and older server versions will return 415.
195
 
acestep/api_server.py CHANGED
@@ -44,8 +44,11 @@ class GenerateMusicRequest(BaseModel):
44
  caption: str = Field(default="", description="Text caption describing the music")
45
  lyrics: str = Field(default="", description="Lyric text")
46
 
47
- # Match feishu bot semantics: `dit` (metas only) vs `llm_dit` (metas + audio codes)
48
- infer_type: Optional[Literal["dit", "llm_dit"]] = None
 
 
 
49
 
50
  bpm: Optional[int] = None
51
  key_scale: str = ""
@@ -77,8 +80,7 @@ class GenerateMusicRequest(BaseModel):
77
  audio_format: str = "mp3"
78
  use_tiled_decode: bool = True
79
 
80
- # 5Hz LM generation (server-side, like gradio's generate_lm_hints_wrapper)
81
- use_5hz_lm: bool = False
82
  lm_model_path: Optional[str] = None # e.g. "acestep-5Hz-lm-0.6B"
83
  lm_backend: Literal["vllm", "pt"] = "vllm"
84
 
@@ -99,15 +101,6 @@ _DEFAULT_DIT_INSTRUCTION = "Fill the audio semantic mask based on the given cond
99
  _DEFAULT_LM_INSTRUCTION = "Generate audio semantic tokens based on the given conditions:"
100
 
101
 
102
- def _normalize_infer_type(v: Any) -> Optional[str]:
103
- s = str(v or "").strip().lower()
104
- if not s:
105
- return None
106
- if s in {"dit", "llm_dit"}:
107
- return s
108
- return None
109
-
110
-
111
  class CreateJobResponse(BaseModel):
112
  job_id: str
113
  status: JobStatus
@@ -123,7 +116,7 @@ class JobResult(BaseModel):
123
  status_message: str = ""
124
  seed_value: str = ""
125
 
126
- # 5Hz LM metadata (present when `use_5hz_lm=true` and server generates codes)
127
  # Keep a raw-ish dict for clients that expect a `metas` object.
128
  metas: Dict[str, Any] = Field(default_factory=dict)
129
  bpm: Optional[int] = None
@@ -539,13 +532,7 @@ def create_app() -> FastAPI:
539
  time_sig_val = req.time_signature
540
  audio_duration_val = req.audio_duration
541
 
542
- # Infer type semantics: `dit` => metas only, `llm_dit` => metas + audio codes.
543
- # Default to llm_dit only when we actually have (or will generate) codes.
544
- explicit_infer = (req.infer_type or "").strip().lower() in {"dit", "llm_dit"}
545
- infer_type = (req.infer_type or "").strip().lower()
546
- if infer_type not in {"dit", "llm_dit"}:
547
- has_codes = bool(audio_code_string and str(audio_code_string).strip())
548
- infer_type = "llm_dit" if (req.use_5hz_lm or has_codes) else "dit"
549
 
550
  # If LM-generated code hints are used, a too-strong cover strength can suppress lyric/vocal conditioning.
551
  # We keep backward compatibility: only auto-adjust when user didn't override (still at default 1.0).
@@ -562,7 +549,16 @@ def create_app() -> FastAPI:
562
  effective_batch_size = 1
563
  effective_batch_size = max(1, int(effective_batch_size))
564
 
565
- if req.use_5hz_lm and not (audio_code_string and str(audio_code_string).strip()):
 
 
 
 
 
 
 
 
 
566
  # Lazy init 5Hz LM once
567
  with app.state._llm_init_lock:
568
  if getattr(app.state, "_llm_initialized", False) is False and getattr(app.state, "_llm_init_error", None) is None:
@@ -590,87 +586,81 @@ def create_app() -> FastAPI:
590
  app.state._llm_initialized = True
591
 
592
  if getattr(app.state, "_llm_init_error", None):
593
- raise RuntimeError(f"5Hz LM init failed: {app.state._llm_init_error}")
594
-
595
- def _lm_call() -> tuple[Dict[str, Any], str, str]:
596
- return llm.generate_with_stop_condition(
597
- caption=req.caption,
598
- lyrics=req.lyrics,
599
- infer_type=infer_type,
600
- temperature=float(req.lm_temperature),
601
- cfg_scale=max(1.0, float(req.lm_cfg_scale)),
602
- negative_prompt=str(req.lm_negative_prompt or "NO USER INPUT"),
603
- top_k=_normalize_optional_int(req.lm_top_k),
604
- top_p=_normalize_optional_float(req.lm_top_p),
605
- repetition_penalty=float(req.lm_repetition_penalty),
606
- )
607
-
608
- meta, codes, status = _lm_call()
609
-
610
- if infer_type == "llm_dit":
611
- if not codes:
612
- raise RuntimeError(f"5Hz LM generation failed: {status}")
613
-
614
- # LM once per job; rely on DiT seeds for batch diversity.
615
- # For convenience, replicate the same codes across the batch.
616
- if effective_batch_size > 1:
617
- # use the same codes for all in the batch
618
- audio_code_string = [codes] * effective_batch_size
619
-
620
- # If needed in future: call LM multiple times for more diverse codes.
621
- # codes_list: list[str] = [codes]
622
- # for _ in range(effective_batch_size - 1):
623
- # _m2, _c2, _s2 = _lm_call()
624
- # if not _c2:
625
- # raise RuntimeError(f"5Hz LM generation failed: {_s2}")
626
- # codes_list.append(_c2)
627
- # audio_code_string = codes_list
628
- else:
629
- audio_code_string = codes
630
-
631
- lm_fields = {
632
- "metas": _normalize_metas(meta),
633
- **_extract_lm_fields(meta),
634
- }
635
- bpm_val, key_scale_val, time_sig_val, audio_duration_val = _maybe_fill_from_metadata(req, meta)
636
-
637
- # If user provided long lyrics but LM didn't provide a usable duration, estimate a longer duration.
638
- if infer_type == "llm_dit" and audio_duration_val is None and (req.audio_duration is None):
639
- est = _estimate_duration_from_lyrics(req.lyrics)
640
- if est is not None:
641
- audio_duration_val = est
642
-
643
- # Optional: auto-tune LM cover strength (opt-in) to avoid suppressing lyric/vocal conditioning.
644
- if infer_type == "llm_dit" and audio_cover_strength_val >= 0.999 and (req.lyrics or "").strip():
645
- tuned = os.getenv("ACESTEP_LM_COVER_STRENGTH")
646
- if tuned is not None and tuned.strip() != "":
647
- audio_cover_strength_val = float(tuned)
648
-
649
- # Align behavior with feishu bot:
650
- # - dit: metas only (ignore audio codes), keep text2music.
651
- # - llm_dit: metas + audio codes, run in cover mode with LM instruction.
652
  instruction_val = req.instruction
653
  task_type_val = (req.task_type or "").strip() or "text2music"
654
 
655
- if infer_type == "dit":
656
  audio_code_string = ""
657
  if task_type_val == "cover":
658
  task_type_val = "text2music"
659
  if (instruction_val or "").strip() in {"", _DEFAULT_LM_INSTRUCTION}:
660
  instruction_val = _DEFAULT_DIT_INSTRUCTION
661
 
662
- if infer_type == "llm_dit":
663
  task_type_val = "cover"
664
  if (instruction_val or "").strip() in {"", _DEFAULT_DIT_INSTRUCTION}:
665
  instruction_val = _DEFAULT_LM_INSTRUCTION
666
 
667
  if not (audio_code_string and str(audio_code_string).strip()):
668
- if explicit_infer or req.use_5hz_lm:
669
- raise RuntimeError("llm_dit requires non-empty audio codes: provide 'audio_code_string' or set 'use_5hz_lm=true'.")
670
- # If not explicitly requested, fall back to dit semantics.
671
- infer_type = "dit"
672
- task_type_val = "text2music"
673
- instruction_val = _DEFAULT_DIT_INSTRUCTION
674
 
675
  first, second, paths, gen_info, status_msg, seed_value, *_ = h.generate_music(
676
  captions=req.caption,
@@ -779,7 +769,7 @@ def create_app() -> FastAPI:
779
  return GenerateMusicRequest(
780
  caption=str(get("caption", "") or ""),
781
  lyrics=str(get("lyrics", "") or ""),
782
- infer_type=_normalize_infer_type(get("infer_type")),
783
  bpm=_to_int(get("bpm"), None),
784
  key_scale=str(get("key_scale", "") or ""),
785
  time_signature=str(get("time_signature", "") or ""),
@@ -803,8 +793,6 @@ def create_app() -> FastAPI:
803
  cfg_interval_end=_to_float(get("cfg_interval_end"), 1.0) or 1.0,
804
  audio_format=str(get("audio_format", "mp3") or "mp3"),
805
  use_tiled_decode=_to_bool(get("use_tiled_decode"), True),
806
-
807
- use_5hz_lm=_to_bool(get("use_5hz_lm"), False),
808
  lm_model_path=str(get("lm_model_path") or "").strip() or None,
809
  lm_backend=str(get("lm_backend", "vllm") or "vllm"),
810
  lm_temperature=_to_float(get("lm_temperature"), _LM_DEFAULT_TEMPERATURE) or _LM_DEFAULT_TEMPERATURE,
 
44
  caption: str = Field(default="", description="Text caption describing the music")
45
  lyrics: str = Field(default="", description="Lyric text")
46
 
47
+ # New API semantics:
48
+ # - thinking=True: use 5Hz LM to generate audio codes (lm-dit behavior)
49
+ # - thinking=False: do not use LM to generate codes (dit behavior)
50
+ # Regardless of thinking, if some metas are missing, server may use LM to fill them.
51
+ thinking: bool = False
52
 
53
  bpm: Optional[int] = None
54
  key_scale: str = ""
 
80
  audio_format: str = "mp3"
81
  use_tiled_decode: bool = True
82
 
83
+ # 5Hz LM (server-side): used for metadata completion and (when thinking=True) codes generation.
 
84
  lm_model_path: Optional[str] = None # e.g. "acestep-5Hz-lm-0.6B"
85
  lm_backend: Literal["vllm", "pt"] = "vllm"
86
 
 
101
  _DEFAULT_LM_INSTRUCTION = "Generate audio semantic tokens based on the given conditions:"
102
 
103
 
 
 
 
 
 
 
 
 
 
104
  class CreateJobResponse(BaseModel):
105
  job_id: str
106
  status: JobStatus
 
116
  status_message: str = ""
117
  seed_value: str = ""
118
 
119
+ # 5Hz LM metadata (present when server invoked LM)
120
  # Keep a raw-ish dict for clients that expect a `metas` object.
121
  metas: Dict[str, Any] = Field(default_factory=dict)
122
  bpm: Optional[int] = None
 
532
  time_sig_val = req.time_signature
533
  audio_duration_val = req.audio_duration
534
 
535
+ thinking = bool(getattr(req, "thinking", False))
 
 
 
 
 
 
536
 
537
  # If LM-generated code hints are used, a too-strong cover strength can suppress lyric/vocal conditioning.
538
  # We keep backward compatibility: only auto-adjust when user didn't override (still at default 1.0).
 
549
  effective_batch_size = 1
550
  effective_batch_size = max(1, int(effective_batch_size))
551
 
552
+ has_codes = bool(audio_code_string and str(audio_code_string).strip())
553
+ need_lm_codes = bool(thinking) and (not has_codes)
554
+ need_lm_metas = (
555
+ (bpm_val is None)
556
+ or (not (key_scale_val or "").strip())
557
+ or (not (time_sig_val or "").strip())
558
+ or (audio_duration_val is None)
559
+ )
560
+
561
+ if need_lm_metas or need_lm_codes:
562
  # Lazy init 5Hz LM once
563
  with app.state._llm_init_lock:
564
  if getattr(app.state, "_llm_initialized", False) is False and getattr(app.state, "_llm_init_error", None) is None:
 
586
  app.state._llm_initialized = True
587
 
588
  if getattr(app.state, "_llm_init_error", None):
589
+ # If codes generation is required, fail hard.
590
+ if need_lm_codes:
591
+ raise RuntimeError(f"5Hz LM init failed: {app.state._llm_init_error}")
592
+ # Otherwise, skip LM best-effort (fallback to default/meta-less behavior)
593
+ else:
594
+ lm_infer = "llm_dit" if need_lm_codes else "dit"
595
+
596
+ def _lm_call() -> tuple[Dict[str, Any], str, str]:
597
+ return llm.generate_with_stop_condition(
598
+ caption=req.caption,
599
+ lyrics=req.lyrics,
600
+ infer_type=lm_infer,
601
+ temperature=float(req.lm_temperature),
602
+ cfg_scale=max(1.0, float(req.lm_cfg_scale)),
603
+ negative_prompt=str(req.lm_negative_prompt or "NO USER INPUT"),
604
+ top_k=_normalize_optional_int(req.lm_top_k),
605
+ top_p=_normalize_optional_float(req.lm_top_p),
606
+ repetition_penalty=float(req.lm_repetition_penalty),
607
+ )
608
+
609
+ meta, codes, status = _lm_call()
610
+
611
+ if need_lm_codes:
612
+ if not codes:
613
+ raise RuntimeError(f"5Hz LM generation failed: {status}")
614
+
615
+ # LM once per job; rely on DiT seeds for batch diversity.
616
+ # For convenience, replicate the same codes across the batch.
617
+ if effective_batch_size > 1:
618
+ audio_code_string = [codes] * effective_batch_size
619
+ else:
620
+ audio_code_string = codes
621
+
622
+ # Always expose LM metas when we invoked LM (even if user already set some fields).
623
+ lm_fields = {
624
+ "metas": _normalize_metas(meta),
625
+ **_extract_lm_fields(meta),
626
+ }
627
+
628
+ # Fill only missing fields (user-provided values win)
629
+ bpm_val, key_scale_val, time_sig_val, audio_duration_val = _maybe_fill_from_metadata(req, meta)
630
+
631
+ # If user provided lyrics but LM didn't provide a usable duration, estimate a longer duration.
632
+ if audio_duration_val is None and (req.audio_duration is None):
633
+ est = _estimate_duration_from_lyrics(req.lyrics)
634
+ if est is not None:
635
+ audio_duration_val = est
636
+
637
+ # Optional: auto-tune LM cover strength (opt-in) to avoid suppressing lyric/vocal conditioning.
638
+ if thinking and audio_cover_strength_val >= 0.999 and (req.lyrics or "").strip():
639
+ tuned = os.getenv("ACESTEP_LM_COVER_STRENGTH")
640
+ if tuned is not None and tuned.strip() != "":
641
+ audio_cover_strength_val = float(tuned)
642
+
643
+ # Align behavior:
644
+ # - thinking=False: metas only (ignore audio codes), keep text2music.
645
+ # - thinking=True: metas + audio codes, run in cover mode with LM instruction.
 
 
646
  instruction_val = req.instruction
647
  task_type_val = (req.task_type or "").strip() or "text2music"
648
 
649
+ if not thinking:
650
  audio_code_string = ""
651
  if task_type_val == "cover":
652
  task_type_val = "text2music"
653
  if (instruction_val or "").strip() in {"", _DEFAULT_LM_INSTRUCTION}:
654
  instruction_val = _DEFAULT_DIT_INSTRUCTION
655
 
656
+ if thinking:
657
  task_type_val = "cover"
658
  if (instruction_val or "").strip() in {"", _DEFAULT_DIT_INSTRUCTION}:
659
  instruction_val = _DEFAULT_LM_INSTRUCTION
660
 
661
  if not (audio_code_string and str(audio_code_string).strip()):
662
+ # thinking=True requires codes generation.
663
+ raise RuntimeError("thinking=true requires non-empty audio codes (LM generation failed).")
 
 
 
 
664
 
665
  first, second, paths, gen_info, status_msg, seed_value, *_ = h.generate_music(
666
  captions=req.caption,
 
769
  return GenerateMusicRequest(
770
  caption=str(get("caption", "") or ""),
771
  lyrics=str(get("lyrics", "") or ""),
772
+ thinking=_to_bool(get("thinking"), False),
773
  bpm=_to_int(get("bpm"), None),
774
  key_scale=str(get("key_scale", "") or ""),
775
  time_signature=str(get("time_signature", "") or ""),
 
793
  cfg_interval_end=_to_float(get("cfg_interval_end"), 1.0) or 1.0,
794
  audio_format=str(get("audio_format", "mp3") or "mp3"),
795
  use_tiled_decode=_to_bool(get("use_tiled_decode"), True),
 
 
796
  lm_model_path=str(get("lm_model_path") or "").strip() or None,
797
  lm_backend=str(get("lm_backend", "vllm") or "vllm"),
798
  lm_temperature=_to_float(get("lm_temperature"), _LM_DEFAULT_TEMPERATURE) or _LM_DEFAULT_TEMPERATURE,