You need to agree to share your contact information to access this model

This repository is publicly accessible, but you have to accept the conditions to access its files and content.

🏆 BrightoSV Anti-Spoofing V1.5 Special Edition

BrightoSV V1.5 SOTA Performance

COMMERCIAL SOTA • GLOBAL RELEASE

The Pinnacle of Voice Biometric Security (Đỉnh cao Bảo mật Sinh trắc học Giọng nói)

BrightoSV V1.5 Special Edition is the Commercial SOTA (State-of-the-Art) voice anti-spoofing model, representing the culmination of BrighTO Technology's research in deepfake detection. This release delivers 34-42% improvement over V1.3 on the most challenging attack vectors while maintaining near-perfect real user acceptance.

(BrightoSV V1.5 Special Edition là model SOTA thương mại về CHỐNG GIẢ MẠO GIỌNG NÓI, đại diện cho đỉnh cao nghiên cứu của BrighTO Technology. Phiên bản này cải thiện 34-42% so với V1.3 trên các vector tấn công khó nhất.)

Engineered for Zero-Trust Banking, National Security, and Enterprise Authentication, V1.5 neutralizes threats from the latest GenAI voice synthesis, Deepfake splicing (LAV-DF), and sophisticated Neural Codec attacks.

🏆 Key Performance Indicators — V1.5 Special Edition

Bank-Grade Benchmarks (4-Second QA Gate)

Evaluated on 400,000+ open-set samples with strict QA gates (Audio ≥ 4s, SNR ≥ 10dB):

Metric (Chỉ số)	Value	Significance (Ý nghĩa)
EER (Standard)	0.22%	🏆 Commercial SOTA. Sub-0.25% on hardest in-the-wild benchmarks. (Tỷ lệ lỗi cân bằng đạt mức SOTA thương mại).
LAV-DF EER	0.22%	🛡️ Deepfake Splice Neutralized. Same error rate as standard attacks — no longer a vulnerability. (Tấn công ghép nối không còn là điểm yếu).
LAV-DF FRR @ FAR=0.01%	1.53%	🔒 Bank-Grade Security. Only ~1.5% retry rate at maximum security threshold. (Chỉ ~1.5% phải xác thực lại ở mức bảo mật cao nhất).
Telephony Real EER	0.06%	✅ Real User Excellence. Near-perfect acceptance for genuine users over phone. (Chấp nhận người dùng thật gần như hoàn hảo).
Clean TTS EER	0.22%	🛡️ Full Cleane TTS Coverage. Most deepfake splices are <4s — this gate captures all. (Bao phủ toàn bộ tấn công LAV-DF).
Clean TTS FRR @ FAR=0.01%	0.63%	👑 Extrem UX with extreme security. < 7/1,000 times users need retry at extreme threshold of 1/10,000 FAR. (7/1000 người dùng phải thử lại)
Accuracy	99.85%	Uncompromising precision for mission-critical decisions. (Độ chính xác cao nhất cho quyết định quan trọng).
Latency ⚡	< 50ms	Real-time processing for streaming authentication. (Xử lý thời gian thực).

Extended Coverage Benchmarks (2-Second QA Gate)

For scenarios requiring shorter audio acceptance (mobile, call centers):

Metric (Chỉ số)	Value	Significance (Ý nghĩa)
EER (Standard)	0.17%	🏆 Best-in-Class. Industry-leading performance on 2s audio. (Dẫn đầu ngành với audio 2 giây).
LAV-DF EER	0.17%	🛡️ Full LAV-DF Coverage. Most deepfake splices are <4s — this gate captures all. (Bao phủ toàn bộ tấn công LAV-DF).
LAV-DF FRR @ FAR=0.01%	1.21%	🔒 Optimized Security. 38% improvement vs V1.3 at same threshold. (Cải thiện 38% so với V1.3).
Clean TTS EER	0.17%	🛡️ Full Cleane TTS Coverage. Most deepfake splices are <4s — this gate captures all. (Bao phủ toàn bộ tấn công LAV-DF).
Clean TTS FRR @ FAR=0.01%	0.31%	👑 Extrem UX with extreme security. ~3/1000 times users need retry at extreme threshold of 1/10,000 FAR. (3/1000 người dùng phải thử lại)
Telephony Real EER	0.04%	✅ Exceptional UX. < 1 in 2,500 legitimate users need retry. (Dưới 1/2500 người dùng phải thử lại).

💀 Adversary Elimination — The "Kill" Statistics

We define "Killed" as forcing the adversary's best possible samples into extreme negative score ranges, creating an unbridgeable Tail Gap between Real and Spoof distributions.

Threat Category	Tail Gap (4s)	Tail Gap (2s)	Status	Description
Clean TTS	> 0.93	> 1.15	KILLED 💀	Modern AI voices (F5-TTS, XTTS, CosyVoice) are mathematically isolated. (Giọng AI hiện đại bị cô lập hoàn toàn).
LAV-DF (Deepfake Splice)	> 0.11	> 0.15	KILLED 💀	42% wider margin than V1.3. Spliced deepfakes neutralized. (Biên an toàn rộng hơn 42% so với V1.3).
Vocoder (Neural Codec)	> 1.92	> 1.90	KILLED 💀	Legacy and modern neural artifacts completely solved. (Hoàn toàn vô hiệu hóa công nghệ Vocoder).
Replay Attacks	> 1.14	> 1.41	KILLED 💀	Physical replay attempts detected with extreme confidence. (Phát hiện tấn công phát lại với độ tin cậy cực cao).

Hardest 5% Tail Composition — Vulnerability Surface Analysis

The hardest 5% of spoof samples represents the model's true attack surface. V1.5 has eliminated systematic vulnerabilities:

Attack Type	V1.2 (Baseline)	V1.5 Special Edition	Improvement
LAV-DF Dominance	66.5%	4.7%	−61.8pp ✅
Clean TTS+LAV Combined	88.8%	42.4%	−46.4pp ✅
Noisy (Noise Floor)	11.2%	57.6%	Random noise, not exploitable

When random noise dominates the hardest tail, systematic attack patterns are solved. (Khi nhiễu ngẫu nhiên chiếm đa số trong nhóm khó nhất, các mẫu tấn công có hệ thống đã bị giải quyết.)

🆚 Competitive Benchmarking — V1.5 vs Industry

Benchmarked against top-tier academic architectures and commercial standards on 400,000+ open-set evaluation samples:

Model / Solution	EER (4s)	LAV-DF FRR@0.01%	Tail Gap	Short Audio (2s)
BrightoSV V1.5 SE 🇻🇳	0.22% 👑	1.53% 👑	> 0.11 👑	0.17% EER 👑
BrightoSV V1.3	0.28%	2.33%	0.078	0.22% EER
BrightoSV V1.2	0.12%	0.38%*	> 0.20	0.91% EER
ASVspoof 2024 SOTA	~5.56%	N/A	Negative	Collapses
ASVspoof 2021 DF	~20.0%	N/A	Overlap	Very Low
Commercial APIs	~1.5-3.0%	~5-10%	< 0.5	Medium

*V1.2 metrics on different eval protocol; V1.5 uses stricter bank-grade methodology.

🛡️ Threat Model Coverage

BrightoSV V1.5 is trained to detect 200+ unique spoofing attack types with Zero-Trust architecture:

🚫 Zero-Day Defense (Future-Proofing)

Unlike detectors that memorize known attacks, BrightoSV uses One-Class Learning enhanced with DPO (Direct Preference Optimization) to model the precise manifold of genuine human speech. Unknown future attacks are automatically rejected because they deviate from the "True Real" center.

(Khác với các detector ghi nhớ tấn công đã biết, BrightoSV sử dụng Học Một Lớp kết hợp DPO để mô hình hóa chính xác phân phối giọng nói thật. Các tấn công tương lai tự động bị chặn vì lệch khỏi "Tâm Thật".)

🧬 Logical Access Attacks (TTS & VC)

Diffusion/Flow Models: F5-TTS, CosyVoice, NaturalSpeech 2/3, Stable Audio
Neural Codecs: VALL-E, XTTS, Bark, Tortoise, Voicebox
Voice Conversion: RVC, So-VITS-SVC, OpenVoice, FreeVC

🔊 Physical Access Attacks

Replay Attacks: High-fidelity recordings via speakers, phones, professional equipment
Channel Manipulation: Codec artifacts, compression, transmission distortion

🧩 Partial Spoofing (Deepfake Injection) — V1.5 Specialty

LAV-DF (Localized Audio Visual DeepFake): Spliced/injected segments where only portions are manipulated
Boundary Detection: Micro-discontinuities at splice points identified via advanced temporal modeling

🎯 Production Deployment Configuration

Scoring Convention

BrightoSV outputs cosine similarity to the OC-Softmax center (inverted scale):

Score Range	Meaning	Interpretation
Near −1.0	High similarity to "True Real" center	✅ Genuine speech (Giọng thật)
Near +1.0	Far from "True Real" center	❌ Spoofed / Fake (Giọng giả)

Decision Logic: score < threshold → REAL · score ≥ threshold → SPOOF

🏦 Mode 1: `bank_top_security` — Maximum Security Banking

For: Large wire transfers (>$10,000), account changes, new device authorization, vault access. (Dành cho: Chuyển khoản lớn, thay đổi tài khoản, ủy quyền thiết bị mới.)

Parameter	Value
Min Audio	≥ 4.0s (after silence trim)
Window / Stride	4s / 2s
Aggregation	bottom quantile (q=0.1)
Min SNR	≥ 10.0 dB

Option	Threshold	Expected FRR	LAV-DF Protection	Use Case
A — Balanced	−0.96	~0.5%	Excellent	Production default (Mặc định sản xuất)
B — Maximum	−0.98	~1.5%	Bank-grade	High-value transactions (Giao dịch giá trị cao)

# bank_top_security config
{
    "min_audio_sec": 4.0,
    "win_sec": 4.0,
    "stride_sec": 2.0,
    "min_snr_db": 10.0,
    "threshold": -0.96,        # Option A (Balanced)
    # "threshold": -0.98,      # Option B (Maximum Security)
    "aggregation": "bottom",
    "quantile": 0.3, # 0.3 sắc hơn 0.1 (sharpened 0.3)
}

🏛️ Mode 2: `bank_flex` — Flexible Banking

For: Mobile banking login, eKYC onboarding, call center verification, standard transactions. (Dành cho: Đăng nhập ngân hàng di động, eKYC, xác minh tổng đài, giao dịch thông thường.)

Parameter	Value
Min Audio	≥ 2.0s (after silence trim)
Window / Stride	2s / 1s
Aggregation	bottom quantile (q=0.1)
Min SNR	≥ 10.0 dB

Option	Threshold	Expected FRR	LAV-DF Protection	Use Case
A — Balanced	−0.95	~0.35%	Excellent	Mobile banking, eKYC (Ngân hàng di động)
B — Secure	−0.97	~0.8%	Enhanced	FinTech compliance (Tuân thủ FinTech)
C — Maximum	−0.98	~1.2%	Bank-grade	Regulatory audit (Kiểm toán quy định)

# bank_flex config
{
    "min_audio_sec": 2.0,
    "win_sec": 2.0,
    "stride_sec": 1.0,
    "min_snr_db": 10.0,
    "threshold": -0.95,        # Option A (Balanced)
    # "threshold": -0.97,      # Option B (Secure)
    # "threshold": -0.98,      # Option C (Maximum)
    "aggregation": "bottom",
    "quantile": 0.3, # 0.3 sắc hơn 0.1 (sharpened 0.3)
}

📱 Mode 3: `consumer` — Consumer Applications

For: Smart home voice control, app login, voice assistants, general authentication. (Dành cho: Điều khiển nhà thông minh, đăng nhập ứng dụng, trợ lý giọng nói.)

Parameter	Value
Min Audio	≥ 2.0s
Window / Stride	2s / 1s
Aggregation	bottom quantile (q=0.1)
Min SNR	≥ 10.0 dB

Option	Threshold	Expected FRR	Protection	Use Case
A — UX Priority	−0.90	~0.1%	Strong	Seamless consumer UX (Trải nghiệm mượt mà)
B — Balanced	−0.92	~0.2%	Very Strong	Consumer + security (Cân bằng bảo mật)
C — Cautious	−0.95	~0.35%	Excellent	High-security consumer (Bảo mật cao)

# consumer config
{
    "min_audio_sec": 2.0,
    "win_sec": 2.0,
    "stride_sec": 1.0,
    "min_snr_db": 10.0,
    "threshold": -0.90,        # Option A (UX Priority)
    # "threshold": -0.92,      # Option B (Balanced)
    # "threshold": -0.95,      # Option C (Cautious)
    "aggregation": "bottom",
    "quantile": 0.3, # 0.3 sắc hơn 0.1 (sharpened 0.3)
}

🚨 QA Gate — Mandatory Pre-Check

All audio MUST pass QA before inference. Never pass low-quality audio to the model.

Check	Requirement	Action on Fail
Duration	≥ 2.0s (flex) or ≥ 4.0s (top_security)	Reject → Prompt re-record
SNR	≥ 10.0 dB	Reject → Prompt re-record

Without QA gate, FRR@0.01% increases 14x. QA is production necessity, not optional. (Không có QA gate, FRR@0.01% tăng 14 lần. QA là bắt buộc cho production.)

Threshold Selection Guide

Scenario	Recommended Mode	Recommended Option
Wire transfers > $10,000	`bank_top_security`	B (Maximum)
Standard banking operations	`bank_top_security`	A (Balanced)
Mobile banking / eKYC	`bank_flex`	A (Balanced)
Call center verification	`bank_flex`	B (Secure)
FinTech regulatory compliance	`bank_flex`	C (Maximum)
Smart home / IoT	`consumer`	A (UX Priority)
App login (general)	`consumer`	B (Balanced)
Airport / National security	`bank_top_security`	B (Maximum)

🛡️ Training Methodology — The "Resilience" Engine

BrightoSV V1.5 is forged on a massive 15+ million sample dataset covering 200+ types of attacks, multiple languages, accents, and recording conditions. Our proprietary training methodology creates a truly resilient model through 4 technical pillars:

(BrightoSV V1.5 được rèn luyện trên tập dữ liệu khổng lồ 15+ triệu mẫu bao gồm hơn 200 loại tấn công, đa dạng ngôn ngữ và điều kiện thu âm. Mô hình đạt độ bền bỉ vượt trội nhờ 4 trụ cột kỹ thuật:)

1️⃣ Environmental & Acoustic Resilience (Kiên cường trước Môi trường & Âm học)

We simulate billions of real-world acoustic environments (Reverb, Room Impulse Response) and chaotic noise profiles. The model is trained to be robust up to 5dB SNR, ensuring it focuses on the source identity rather than background noise.

(Mô phỏng hàng tỷ môi trường thực tế để đảm bảo độ chính xác tới ngưỡng 5dB SNR, tập trung vào định danh người nói thay vì tiếng ồn nền.)

Category	Examples	Technical Goal
🐾 Animals	Dog barking (chó sủa), Rooster crowing (gà gáy), Pig grunting (lợn kêu), Cat meowing (mèo gào)	Resilience against sudden biological sounds
🏠 Home	Ticking clocks (đồng hồ gõ), Vacuum cleaners (máy hút bụi), Washing machines (máy giặt rung)	Reliability for WFH environments
🗣️ Human (Hardest)	Coughing (ho), Sneezing (hắt hơi), Baby crying (trẻ khóc), Footsteps (tiếng chân)	Disentangle target speaker from human artifacts
🏙️ Urban	Street noise (phố xá), Sirens (còi cứu hỏa), Car horns (còi xe), Trains (tàu hỏa)	On-the-go verification robustness
⛈️ Natural	Heavy rain (mưa lớn), Wind gusts (gió giật), Thunder (sấm sét)	Outdoor environment stability

2️⃣ Telephony & Channel Robustness (Bền bỉ với Đường truyền & Kênh thoại)

By simulating various transmission codecs (GSM, VoIP – Zalo/WhatsApp, MP3, AAC) and microphone distortions (RawBoost), the model learns to distinguish between benign compression artifacts and malicious deepfake artifacts. It won't fail just because the user has a poor connection.

(Mô phỏng đa dạng các chuẩn nén và nhiễu microphone để phân biệt rõ nhiễu mạng vô hại và dấu hiệu giả mạo độc hại. Hệ thống không sai sót chỉ vì kết nối mạng kém.)

3️⃣ Artifact Preservation Strategy — Asymmetric Learning (Chiến lược Bảo toàn Dấu vết - Học Bất đối xứng)

Unlike standard approaches that "over-clean" data, our pipeline uses a surgical approach:

REAL speech: Augmented heavily → builds robustness
FAKE speech: Treated delicately → preserves microscopic digital artifacts (vocoder buzz, phase discontinuities)

This ensures modern Neural Codec traces are never washed away during training.

(Khác với cách tiếp cận thông thường, chúng tôi tăng cường cực mạnh dữ liệu THẬT để tăng độ bền, nhưng xử lý nhẹ nhàng dữ liệu GIẢ để bảo toàn tuyệt đối các lỗi vi mô.)

4️⃣ Multilingual Generalization (Tổng quát hóa Đa ngôn ngữ)

Trained on a diverse multilingual corpus (Vietnamese, English, Chinese, German, French, Japanese, Arabic, Dutch...), the model focuses on universal spoofing traces rather than language-specific phonemes, offering global protection regardless of language or accent.

(Được huấn luyện trên kho ngữ liệu đa dạng, mô hình tập trung nhận diện các dấu hiệu giả mạo phổ quát thay vì phụ thuộc vào đặc điểm ngôn ngữ cụ thể, mang lại khả năng bảo vệ toàn cầu.)

⚙️ Technical Specifications

Specification	Value
Model Version	V1.5 Special Edition (Commercial SOTA)
Architecture	Audio Self-Supervised Learning Base + GAT
Parameters	316M (High-Capacity Backbone)
Input Sample Rate	16kHz (Auto-resampling supported)
Input Formats	WAV, MP3, OGG, FLAC, M4A
Min Duration	2.0s (flex) / 4.0s (top_security)
Min SNR	10.0 dB
Output	Cosine similarity score [−1.0, +1.0]

🚀 Hardware & Performance

Specification	Value
GPU Support	NVIDIA RTX 1080+, A10G, A100, H100, L4, T4
CPU Support	Intel Xeon, AMD EPYC (via ONNX/OpenVINO)
Inference Latency	< 50ms (RTX 3090)
Model Size	~1.2 GB
Batch Processing	Supported for enterprise throughput
Concurrent Requests	High-density deployment ready

🌍 Application Scenarios

Optimized for sectors requiring maximum security:

Sector	Use Case	Recommended Mode
🏦 Banking & Finance	Wire transfers, Voice Banking	`bank_top_security`
🆔 eKYC	Customer onboarding, Liveness	`bank_flex`
🪙 Crypto & FinTech	Wallet protection, Transaction auth	`bank_top_security`
✈️ National Security	Border control, Immigration	`bank_top_security`
🚓 Corrections	Inmate communication monitoring	`bank_flex`
🎧 Call Centers	Real-time deepfake detection	`bank_flex`
📱 Consumer Apps	Voice login, Smart home	`consumer`

📈 Version History

Version	Release	Highlight	Status
V1.5 SE	Feb 2026	Commercial SOTA — 34-42% LAV-DF improvement	🟢 Current
V1.3	Jan 2026	LAV-DF breakthrough (66% → 5% tail)	Production
V1.2	Dec 2025	Bank-grade baseline	Legacy

🔒 Security & Compliance

Aspect	Implementation
Deployment	On-premise (Docker/K8s) or Private Cloud
Data Privacy	Zero-retention — Audio processed in RAM only
Encryption	End-to-end TLS 1.3
Compliance	GDPR, PCI-DSS, SOC 2 ready
Audit Trail	Full logging capability (no audio stored)

📞 Access & Licensing

This model is Private and available exclusively for enterprise partners under NDA. (Model nội bộ, chỉ cung cấp cho đối tác Doanh nghiệp ký NDA.)

Thương mại & Triển khai

License trọn gói hoặc qua API
Hỗ trợ tích hợp theo yêu cầu (triển khai, tối ưu hiệu năng, giám sát chất lượng)
Công ty Cổ phần SphinX (sphinxjsc.com) được giao quyền đóng gói, cung cấp API và phân phối

Bản quyền & License

Thương mại / Proprietary. Việc sử dụng, phân phối lại hoặc tạo bản phái sinh cần có chấp thuận bằng văn bản từ BrighTO Technology.

Liên hệ

Purpose	Contact
Commercial Licensing	`nguyen@brighto.ai`, `nghia@brighto.ai`
API & Distribution	`duc@sphinxjsc.com` (SphinX JSC)
Technical Inquiries	`nguyen@hatto.com`

🏆 BrightoSV V1.5 Special Edition

Commercial SOTA • Global Ready • Bank-Grade Security

Built in Vietnam 🇻🇳 • Engineered for the World 🌏

This model card refers to BrightoSV Anti-Spoof V1.5 Special Edition (Commercial SOTA Release). All benchmark results are verified on internal test sets comprising 400,000+ open-set samples with strict bank-grade QA methodology.

Downloads last month: -; Downloads are not tracked for this model. How to track

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Evaluation results

Equal Error Rate (%) - 2s Gate
self-reported

0.170
Equal Error Rate (%) - 4s Gate
self-reported

0.220
Accuracy (%)
self-reported

99.850
LAV-DF FRR @ FAR=0.01% (%) - 2s Gate
self-reported

1.210
LAV-DF FRR @ FAR=0.01% (%) - 4s Gate
self-reported

1.530