license: apache-2.0
TruthShield VoiceGen
Multi-Speaker, Multilingual TTS with Accent & Style Transfer
Overview
TruthShield VoiceGen is an advanced text-to-speech system supporting 11 languages with voice cloning, accent transfer, and style control capabilities. Built with safety-first principles using forensic speaker verification.
Features
- π 11 Languages: Hindi, Bengali, Telugu, Tamil, Kannada, Marathi, Gujarati, Bhojpuri, Maithili, Chhattisgarhi, Magahi, English
- π€ Voice Cloning: Clone voices from short reference audio
- π£οΈ Accent Transfer: Transfer accents while preserving content
- π Style Control: Adjust speaking style and emotion
- π‘οΈ Safety Verification: ECAPA-TDNN forensic verification
Quick Start
Installation
git clone https://github.com/truthshield/voicegen.git
cd voicegen
pip install -r requirements.txt
Run Server
uvicorn server:app --host 0.0.0.0 --port 8080
API Usage
curl -X GET "http://localhost:8080/Get_Inference?text=hello%20world&lang=english" \
-F "speaker_wav=@speaker.wav" \
--output output.wav
API Specification
Endpoint: GET /Get_Inference
| Parameter | Type | Required | Description |
|---|---|---|---|
| text | query | Yes | Text to synthesize |
| lang | query | Yes | Language code |
| speaker_wav | file | Yes | Reference speaker audio (WAV) |
Supported Languages
bhojpuri, bengali, english, gujarati, hindi, chhattisgarhi, kannada, magahi, maithili, marathi, telugu
Response Headers
X-Model-Version: Model version stringX-Speaker-Similarity: Voice similarity scoreX-Safety-Verified: Safety verification status
Architecture
ββββββββββββ ββββββββββββ ββββββββββββ ββββββββββββ
β Text ββββΆβ Phoneme ββββΆβ VITS ββββΆβ Safety β
β Input β β Encoder β β Encoder β β Layer β
ββββββββββββ ββββββββββββ ββββββββββββ ββββββ¬ββββββ
β
ββββββββββββ ββββββββββββββββ βββββββββββββββββΌβββββββ
β Audio βββββ WAV Out βββββ HiFiGAN Vocoder β
β Output β β + Headers β β β
ββββββββββββ ββββββββββββββββ ββββββββββββββββββββββββ
Safety Layer
All generated audio passes through ECAPA-TDNN speaker verification:
- Extract speaker embeddings from reference
- Generate audio using VITS
- Extract embeddings from generated audio
- Compute similarity score
- Apply threshold (0.85) for verification
Datasets
See datasets.csv for training data sources.
License
Apache 2.0
Citation
@misc{truthshield2024voicegen,
title={TruthShield VoiceGen: Multi-Speaker Multilingual TTS},
author={TruthShield Team},
year={2024}
}
- Downloads last month
- 13