Spaces:
Runtime error
Runtime error
🔧 修復語音克隆功能 - 使用真正的 BreezyVoice 推論邏輯
Browse files- README.md +73 -6
- app.py +275 -0
- requirements.txt +17 -0
README.md
CHANGED
|
@@ -1,12 +1,79 @@
|
|
| 1 |
---
|
| 2 |
-
title:
|
| 3 |
-
emoji:
|
| 4 |
-
colorFrom:
|
| 5 |
-
colorTo:
|
| 6 |
sdk: gradio
|
| 7 |
-
sdk_version:
|
| 8 |
app_file: app.py
|
| 9 |
pinned: false
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 10 |
---
|
| 11 |
|
| 12 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
---
|
| 2 |
+
title: MediaTek BreezyVoice 語音克隆
|
| 3 |
+
emoji: 🎭
|
| 4 |
+
colorFrom: purple
|
| 5 |
+
colorTo: pink
|
| 6 |
sdk: gradio
|
| 7 |
+
sdk_version: "4.44.0"
|
| 8 |
app_file: app.py
|
| 9 |
pinned: false
|
| 10 |
+
hardware: zerogpu
|
| 11 |
+
startup_duration_timeout: 30m
|
| 12 |
+
tags:
|
| 13 |
+
- voice-cloning
|
| 14 |
+
- zero-shot
|
| 15 |
+
- taiwanese-mandarin
|
| 16 |
+
- breezyvoice
|
| 17 |
+
- mediatek
|
| 18 |
---
|
| 19 |
|
| 20 |
+
# 🎭 MediaTek BreezyVoice 語音克隆
|
| 21 |
+
|
| 22 |
+
## 📋 專案概述
|
| 23 |
+
MediaTek BreezyVoice 零樣本語音克隆系統,專為台灣繁體中文優化。使用先進的語音合成技術,能夠從短短 5-20 秒的參考語音中學習聲音特徵,並合成任意文字內容。
|
| 24 |
+
|
| 25 |
+
## 🎯 主要功能
|
| 26 |
+
- **零樣本克隆**: 無需訓練,直接克隆任何聲音
|
| 27 |
+
- **台灣優化**: 專門針對台灣國語和繁體中文優化
|
| 28 |
+
- **高品質合成**: MediaTek 先進的語音合成技術
|
| 29 |
+
- **GPU 加速**: 使用 ZeroGPU 實現快速處理
|
| 30 |
+
|
| 31 |
+
## 🚀 使用方法
|
| 32 |
+
1. **初始化**: 點擊「初始化 BreezyVoice」按鈕設置模型
|
| 33 |
+
2. **上傳語音**: 上傳 5-20 秒清晰的中文語音作為參考
|
| 34 |
+
3. **輸入文字**: 輸入要用克隆聲音說出的內容
|
| 35 |
+
4. **轉錄參考** (可選): 輸入參考語音的轉錄文字以提高品質
|
| 36 |
+
5. **開始克隆**: 點擊「開始語音克隆」按鈕
|
| 37 |
+
|
| 38 |
+
## 💡 最佳效果建議
|
| 39 |
+
- 🎙️ **音質**: 確保參考語音清晰、無雜音
|
| 40 |
+
- 📏 **長度**: 推薦 5-20 秒的參考語音
|
| 41 |
+
- 🗣️ **發音**: 自然清晰地朗讀,無需刻意
|
| 42 |
+
- 📝 **轉錄**: 提供參考語音的轉錄文字可顯著提高克隆品質
|
| 43 |
+
|
| 44 |
+
## ⚡ 技術規格
|
| 45 |
+
- **模型**: MediaTek BreezyVoice 完整版
|
| 46 |
+
- **硬體**: ZeroGPU (H200 70GB VRAM)
|
| 47 |
+
- **支援格式**: WAV, MP3, M4A
|
| 48 |
+
- **語言**: 繁體中文 (台灣)
|
| 49 |
+
- **特色**: 零樣本學習、即時推論
|
| 50 |
+
|
| 51 |
+
## 🔗 API 使用
|
| 52 |
+
```python
|
| 53 |
+
from gradio_client import Client
|
| 54 |
+
|
| 55 |
+
client = Client("sheep52031/breezyvoice-tts")
|
| 56 |
+
|
| 57 |
+
# 上傳參考語音和合成文字
|
| 58 |
+
result = client.predict(
|
| 59 |
+
speaker_audio="reference_audio.wav",
|
| 60 |
+
content_text="要合成的文字內容",
|
| 61 |
+
speaker_transcription="參考語音轉錄 (可選)",
|
| 62 |
+
api_name="/predict"
|
| 63 |
+
)
|
| 64 |
+
|
| 65 |
+
synthesized_audio = result[0] # 合成的語音
|
| 66 |
+
status_info = result[1] # 處理狀態
|
| 67 |
+
```
|
| 68 |
+
|
| 69 |
+
## 📊 應用場景
|
| 70 |
+
- 🎬 **影片配音**: 製作個人化旁白
|
| 71 |
+
- 🎤 **語音助手**: 創建專屬聲音的 AI 助手
|
| 72 |
+
- 📚 **有聲書籍**: 用特定聲音朗讀文字內容
|
| 73 |
+
- 🎮 **遊戲配音**: 角色語音生成
|
| 74 |
+
- 🏢 **商業應用**: 品牌專屬語音系統
|
| 75 |
+
|
| 76 |
+
## ⚠️ 使用須知
|
| 77 |
+
- 僅供合法用途使用,請勿用於偽造他人聲音進行不當行為
|
| 78 |
+
- 建議在使用前取得聲音原主人的同意
|
| 79 |
+
- 系統生成的語音僅供學習和研究用途
|
app.py
ADDED
|
@@ -0,0 +1,275 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
"""
|
| 2 |
+
MediaTek BreezyVoice 真實語音克隆 Space
|
| 3 |
+
基於成功的本地測試實現真正的語音合成功能
|
| 4 |
+
"""
|
| 5 |
+
|
| 6 |
+
import gradio as gr
|
| 7 |
+
import spaces
|
| 8 |
+
import torch
|
| 9 |
+
import torchaudio
|
| 10 |
+
import tempfile
|
| 11 |
+
import os
|
| 12 |
+
import time
|
| 13 |
+
import subprocess
|
| 14 |
+
import sys
|
| 15 |
+
from pathlib import Path
|
| 16 |
+
|
| 17 |
+
# 全域變數
|
| 18 |
+
cosyvoice = None
|
| 19 |
+
bopomofo_converter = None
|
| 20 |
+
setup_completed = False
|
| 21 |
+
|
| 22 |
+
@spaces.GPU(duration=300)
|
| 23 |
+
def setup_breezyvoice():
|
| 24 |
+
"""設置 BreezyVoice 環境並載入模型"""
|
| 25 |
+
global cosyvoice, bopomofo_converter, setup_completed
|
| 26 |
+
|
| 27 |
+
if setup_completed:
|
| 28 |
+
return "✅ BreezyVoice 已準備就緒"
|
| 29 |
+
|
| 30 |
+
try:
|
| 31 |
+
print("🔧 正在設置 BreezyVoice...")
|
| 32 |
+
|
| 33 |
+
# 1. Clone BreezyVoice repository
|
| 34 |
+
repo_path = "/tmp/BreezyVoice"
|
| 35 |
+
if not os.path.exists(repo_path):
|
| 36 |
+
print("📥 下載 BreezyVoice repository...")
|
| 37 |
+
result = subprocess.run([
|
| 38 |
+
"git", "clone",
|
| 39 |
+
"https://github.com/mtkresearch/BreezyVoice.git",
|
| 40 |
+
repo_path
|
| 41 |
+
], capture_output=True, text=True, timeout=300)
|
| 42 |
+
|
| 43 |
+
if result.returncode != 0:
|
| 44 |
+
raise Exception(f"下載失敗: {result.stderr}")
|
| 45 |
+
|
| 46 |
+
# 2. 添加模組路徑
|
| 47 |
+
sys.path.insert(0, repo_path)
|
| 48 |
+
|
| 49 |
+
# 3. 安裝必要依賴
|
| 50 |
+
print("📦 安裝依賴...")
|
| 51 |
+
dependencies = [
|
| 52 |
+
"g2pw", "WeTextProcessing", "opencc-python-reimplemented",
|
| 53 |
+
"hydra-core", "HyperPyYAML", "conformer", "lightning",
|
| 54 |
+
"diffusers", "einops"
|
| 55 |
+
]
|
| 56 |
+
|
| 57 |
+
for dep in dependencies:
|
| 58 |
+
print(f"安裝 {dep}...")
|
| 59 |
+
result = subprocess.run(
|
| 60 |
+
["pip", "install", dep, "--no-cache-dir"],
|
| 61 |
+
capture_output=True, timeout=120
|
| 62 |
+
)
|
| 63 |
+
if result.returncode != 0:
|
| 64 |
+
print(f"⚠️ {dep} 安裝失敗,繼續...")
|
| 65 |
+
|
| 66 |
+
# 4. 導入 BreezyVoice 模組
|
| 67 |
+
try:
|
| 68 |
+
from single_inference import CustomCosyVoice
|
| 69 |
+
from g2pw import G2PWConverter
|
| 70 |
+
print("✅ BreezyVoice 模組導入成功")
|
| 71 |
+
except ImportError as e:
|
| 72 |
+
raise Exception(f"模組導入失敗: {e}")
|
| 73 |
+
|
| 74 |
+
# 5. 載入模型
|
| 75 |
+
print("🔄 載入 BreezyVoice 完整版模型...")
|
| 76 |
+
cosyvoice = CustomCosyVoice("MediaTek-Research/BreezyVoice")
|
| 77 |
+
bopomofo_converter = G2PWConverter()
|
| 78 |
+
|
| 79 |
+
setup_completed = True
|
| 80 |
+
print("✅ BreezyVoice 設置完成!")
|
| 81 |
+
|
| 82 |
+
# 檢查 VRAM 使用
|
| 83 |
+
if torch.cuda.is_available():
|
| 84 |
+
vram_used = torch.cuda.memory_allocated() / 1024**3
|
| 85 |
+
return f"✅ BreezyVoice 設置完成!VRAM 使用: {vram_used:.2f}GB"
|
| 86 |
+
|
| 87 |
+
return "✅ BreezyVoice 設置完成!"
|
| 88 |
+
|
| 89 |
+
except Exception as e:
|
| 90 |
+
print(f"❌ 設置失敗: {str(e)}")
|
| 91 |
+
return f"❌ 設置失敗: {str(e)}"
|
| 92 |
+
|
| 93 |
+
@spaces.GPU(duration=180)
|
| 94 |
+
def breezy_voice_clone(speaker_audio, content_text, speaker_transcription=None):
|
| 95 |
+
"""執行 BreezyVoice 語音克隆"""
|
| 96 |
+
global cosyvoice, bopomofo_converter
|
| 97 |
+
|
| 98 |
+
if speaker_audio is None:
|
| 99 |
+
return None, "❌ 請先上傳或錄製參考語音"
|
| 100 |
+
|
| 101 |
+
if not content_text.strip():
|
| 102 |
+
return None, "❌ 請輸入要合成的文字"
|
| 103 |
+
|
| 104 |
+
if not setup_completed or cosyvoice is None:
|
| 105 |
+
setup_status = setup_breezyvoice()
|
| 106 |
+
if "❌" in setup_status:
|
| 107 |
+
return None, setup_status
|
| 108 |
+
|
| 109 |
+
try:
|
| 110 |
+
with tempfile.TemporaryDirectory() as temp_dir:
|
| 111 |
+
# 處理輸入音訊
|
| 112 |
+
input_audio_path = os.path.join(temp_dir, "speaker_voice.wav")
|
| 113 |
+
output_audio_path = os.path.join(temp_dir, "cloned_voice.wav")
|
| 114 |
+
|
| 115 |
+
# 保存參考音訊
|
| 116 |
+
sample_rate, audio_data = speaker_audio
|
| 117 |
+
torchaudio.save(input_audio_path, torch.tensor(audio_data).unsqueeze(0), sample_rate)
|
| 118 |
+
|
| 119 |
+
# 如果沒有提供轉錄,使用預設
|
| 120 |
+
if not speaker_transcription or not speaker_transcription.strip():
|
| 121 |
+
speaker_transcription = "這是一段參考語音,用於語音克隆分析。"
|
| 122 |
+
|
| 123 |
+
print(f"🎤 合成文字: {content_text}")
|
| 124 |
+
print(f"📝 參考轉錄: {speaker_transcription}")
|
| 125 |
+
|
| 126 |
+
# 執行語音合成
|
| 127 |
+
synthesis_start = time.time()
|
| 128 |
+
|
| 129 |
+
try:
|
| 130 |
+
# 導入 single_inference 函數
|
| 131 |
+
from single_inference import single_inference
|
| 132 |
+
|
| 133 |
+
# 執行語音合成
|
| 134 |
+
single_inference(
|
| 135 |
+
speaker_prompt_audio_path=input_audio_path,
|
| 136 |
+
content_to_synthesize=content_text,
|
| 137 |
+
output_path=output_audio_path,
|
| 138 |
+
cosyvoice=cosyvoice,
|
| 139 |
+
bopomofo_converter=bopomofo_converter,
|
| 140 |
+
speaker_prompt_text_transcription=speaker_transcription
|
| 141 |
+
)
|
| 142 |
+
|
| 143 |
+
synthesis_time = time.time() - synthesis_start
|
| 144 |
+
|
| 145 |
+
# 檢查輸出
|
| 146 |
+
if os.path.exists(output_audio_path):
|
| 147 |
+
# 讀取合成的音訊
|
| 148 |
+
synthesized_audio, sample_rate = torchaudio.load(output_audio_path)
|
| 149 |
+
synthesized_audio = synthesized_audio.numpy()
|
| 150 |
+
|
| 151 |
+
# 計算音訊長度
|
| 152 |
+
audio_duration = synthesized_audio.shape[1] / sample_rate
|
| 153 |
+
rtf = synthesis_time / audio_duration if audio_duration > 0 else float('inf')
|
| 154 |
+
|
| 155 |
+
# 檢查 VRAM 使用
|
| 156 |
+
vram_info = ""
|
| 157 |
+
if torch.cuda.is_available():
|
| 158 |
+
vram_used = torch.cuda.memory_allocated() / 1024**3
|
| 159 |
+
vram_info = f"💾 VRAM: {vram_used:.2f}GB"
|
| 160 |
+
|
| 161 |
+
status = f"""✅ 語音克隆成功!
|
| 162 |
+
|
| 163 |
+
🎙️ 參考語音: {len(audio_data)/sample_rate:.1f}秒
|
| 164 |
+
📝 合成內容: {content_text}
|
| 165 |
+
⏱️ 合成時間: {synthesis_time:.1f}秒
|
| 166 |
+
🎵 輸出長度: {audio_duration:.1f}秒
|
| 167 |
+
📊 RTF: {rtf:.3f} {'(實時)' if rtf < 1.0 else '(非實時)'}
|
| 168 |
+
{vram_info}
|
| 169 |
+
🤖 模型: MediaTek BreezyVoice 完整版"""
|
| 170 |
+
|
| 171 |
+
return (sample_rate, synthesized_audio[0]), status
|
| 172 |
+
else:
|
| 173 |
+
return None, "❌ 語音合成失敗:未生成輸出檔案"
|
| 174 |
+
|
| 175 |
+
except Exception as e:
|
| 176 |
+
return None, f"❌ 語音合成失敗: {str(e)}"
|
| 177 |
+
|
| 178 |
+
except Exception as e:
|
| 179 |
+
return None, f"❌ 處理錯誤: {str(e)}"
|
| 180 |
+
|
| 181 |
+
# 創建 Gradio 界面
|
| 182 |
+
with gr.Blocks(title="BreezyVoice 語音克隆", theme=gr.themes.Soft()) as demo:
|
| 183 |
+
gr.Markdown("# 🎭 MediaTek BreezyVoice 語音克隆")
|
| 184 |
+
gr.Markdown("**零樣本語音克隆系統** - 專為台灣繁體中文優化")
|
| 185 |
+
|
| 186 |
+
# 初始化狀態顯示
|
| 187 |
+
setup_status = gr.Textbox(
|
| 188 |
+
label="🔧 系統狀態",
|
| 189 |
+
value="⏳ 準備初始化 BreezyVoice...",
|
| 190 |
+
interactive=False
|
| 191 |
+
)
|
| 192 |
+
|
| 193 |
+
# 初始化按鈕
|
| 194 |
+
init_btn = gr.Button("🚀 初始化 BreezyVoice", variant="primary")
|
| 195 |
+
|
| 196 |
+
with gr.Row():
|
| 197 |
+
with gr.Column(scale=1):
|
| 198 |
+
gr.Markdown("### 🎙️ 步驟 1: 上傳參考語音")
|
| 199 |
+
gr.Markdown("上傳 5-20 秒清晰的中文語音作為聲音特徵參考")
|
| 200 |
+
|
| 201 |
+
speaker_audio = gr.Audio(
|
| 202 |
+
sources=["microphone", "upload"],
|
| 203 |
+
type="numpy",
|
| 204 |
+
label="參考語音 (5-20秒)"
|
| 205 |
+
)
|
| 206 |
+
|
| 207 |
+
gr.Markdown("### 📝 步驟 2: 輸入文字內容")
|
| 208 |
+
content_text = gr.Textbox(
|
| 209 |
+
lines=3,
|
| 210 |
+
placeholder="請輸入要用克隆聲音說出的內容...",
|
| 211 |
+
label="合成文字內容",
|
| 212 |
+
value="哈囉!這裡是光鈦廣告的小陳啦,我是林家任創造出來的AI Agent,不是詐騙集團啦。"
|
| 213 |
+
)
|
| 214 |
+
|
| 215 |
+
gr.Markdown("### 🔤 步驟 3: 參考語音轉錄 (可選)")
|
| 216 |
+
speaker_transcription = gr.Textbox(
|
| 217 |
+
lines=2,
|
| 218 |
+
placeholder="如果知道參考語音的內容,請輸入轉錄文字以提高品質...",
|
| 219 |
+
label="參考語音轉錄 (可選)",
|
| 220 |
+
value=""
|
| 221 |
+
)
|
| 222 |
+
|
| 223 |
+
clone_btn = gr.Button("🎭 開始語音克隆", variant="primary", size="lg")
|
| 224 |
+
|
| 225 |
+
with gr.Column(scale=1):
|
| 226 |
+
gr.Markdown("### 🎵 克隆結果")
|
| 227 |
+
|
| 228 |
+
result_audio = gr.Audio(
|
| 229 |
+
label="克隆的語音",
|
| 230 |
+
type="numpy"
|
| 231 |
+
)
|
| 232 |
+
|
| 233 |
+
result_status = gr.Textbox(
|
| 234 |
+
label="📋 處理狀態",
|
| 235 |
+
lines=12,
|
| 236 |
+
max_lines=15,
|
| 237 |
+
interactive=False
|
| 238 |
+
)
|
| 239 |
+
|
| 240 |
+
# 使用說明
|
| 241 |
+
with gr.Accordion("📖 使用說明", open=False):
|
| 242 |
+
gr.Markdown("""
|
| 243 |
+
## 🎯 操作步驟
|
| 244 |
+
1. **初始化**: 點擊「初始化 BreezyVoice」按鈕設置模型
|
| 245 |
+
2. **上傳語音**: 上傳 5-20 秒的清晰中文語音作為參考
|
| 246 |
+
3. **輸入文字**: 輸入要用克隆聲音說出的內容
|
| 247 |
+
4. **開始克隆**: 點擊「開始語音克隆」按鈕
|
| 248 |
+
|
| 249 |
+
## 💡 最佳效果建議
|
| 250 |
+
- 🎙️ 參考語音清晰、無雜音
|
| 251 |
+
- 📏 長度適中(5-20秒)
|
| 252 |
+
- 🗣️ 自然朗讀,發音清楚
|
| 253 |
+
- 📝 如果知道參考語音的轉錄內容,填寫可提高品質
|
| 254 |
+
|
| 255 |
+
## ⚡ 技術特色
|
| 256 |
+
- 🇹🇼 台灣繁體中文專門優化
|
| 257 |
+
- 🎯 零樣本克隆(無需訓練)
|
| 258 |
+
- ⚡ ZeroGPU 加速處理
|
| 259 |
+
- 🔊 MediaTek 先進語音合成技術
|
| 260 |
+
""")
|
| 261 |
+
|
| 262 |
+
# 事件綁定
|
| 263 |
+
init_btn.click(
|
| 264 |
+
fn=setup_breezyvoice,
|
| 265 |
+
outputs=[setup_status]
|
| 266 |
+
)
|
| 267 |
+
|
| 268 |
+
clone_btn.click(
|
| 269 |
+
fn=breezy_voice_clone,
|
| 270 |
+
inputs=[speaker_audio, content_text, speaker_transcription],
|
| 271 |
+
outputs=[result_audio, result_status]
|
| 272 |
+
)
|
| 273 |
+
|
| 274 |
+
if __name__ == "__main__":
|
| 275 |
+
demo.launch()
|
requirements.txt
ADDED
|
@@ -0,0 +1,17 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
gradio>=4.40.0
|
| 2 |
+
spaces>=0.28.0
|
| 3 |
+
torch>=2.0.0
|
| 4 |
+
torchaudio>=2.0.0
|
| 5 |
+
transformers>=4.40.0
|
| 6 |
+
soundfile>=0.12.1
|
| 7 |
+
numpy>=1.21.0
|
| 8 |
+
librosa>=0.10.0
|
| 9 |
+
g2pw
|
| 10 |
+
WeTextProcessing
|
| 11 |
+
opencc-python-reimplemented
|
| 12 |
+
hydra-core>=1.3.0
|
| 13 |
+
HyperPyYAML>=1.2.0
|
| 14 |
+
conformer>=0.3.0
|
| 15 |
+
pytorch-lightning
|
| 16 |
+
diffusers
|
| 17 |
+
einops
|