license: apache-2.0

TruthShield VoiceGen

Multi-Speaker, Multilingual TTS with Accent & Style Transfer

Overview

TruthShield VoiceGen is an advanced text-to-speech system supporting 11 languages with voice cloning, accent transfer, and style control capabilities. Built with safety-first principles using forensic speaker verification.

Features

🌍 11 Languages: Hindi, Bengali, Telugu, Tamil, Kannada, Marathi, Gujarati, Bhojpuri, Maithili, Chhattisgarhi, Magahi, English
🎤 Voice Cloning: Clone voices from short reference audio
🗣️ Accent Transfer: Transfer accents while preserving content
🎭 Style Control: Adjust speaking style and emotion
🛡️ Safety Verification: ECAPA-TDNN forensic verification

Quick Start

Installation

git clone https://github.com/truthshield/voicegen.git
cd voicegen
pip install -r requirements.txt

Run Server

uvicorn server:app --host 0.0.0.0 --port 8080

API Usage

curl -X GET "http://localhost:8080/Get_Inference?text=hello%20world&lang=english" \
  -F "speaker_wav=@speaker.wav" \
  --output output.wav

API Specification

Endpoint: GET /Get_Inference

Parameter	Type	Required	Description
text	query	Yes	Text to synthesize
lang	query	Yes	Language code
speaker_wav	file	Yes	Reference speaker audio (WAV)

Supported Languages

bhojpuri, bengali, english, gujarati, hindi, chhattisgarhi, kannada, magahi, maithili, marathi, telugu

Response Headers

X-Model-Version: Model version string
X-Speaker-Similarity: Voice similarity score
X-Safety-Verified: Safety verification status

Architecture

┌──────────┐   ┌──────────┐   ┌──────────┐   ┌──────────┐
│   Text   │──▶│ Phoneme  │──▶│   VITS   │──▶│  Safety  │
│  Input   │   │ Encoder  │   │ Encoder  │   │  Layer   │
└──────────┘   └──────────┘   └──────────┘   └────┬─────┘
                                                  │
┌──────────┐   ┌──────────────┐   ┌───────────────▼──────┐
│  Audio   │◀──│   WAV Out    │◀──│   HiFiGAN Vocoder    │
│  Output  │   │  + Headers   │   │                      │
└──────────┘   └──────────────┘   └──────────────────────┘

Safety Layer

All generated audio passes through ECAPA-TDNN speaker verification:

Extract speaker embeddings from reference
Generate audio using VITS
Extract embeddings from generated audio
Compute similarity score
Apply threshold (0.85) for verification

Datasets

See datasets.csv for training data sources.

License

Apache 2.0

Citation

@misc{truthshield2024voicegen,
  title={TruthShield VoiceGen: Multi-Speaker Multilingual TTS},
  author={TruthShield Team},
  year={2024}
}

Downloads last month: 13