You need to agree to share your contact information to access this model

This repository is publicly accessible, but you have to accept the conditions to access its files and content.

Log in or Sign Up to review the conditions and access this model content.

🏆 BrightoSV Anti-Spoofing V1.5 Special Edition

BrightoSV V1.5 SOTA Performance

COMMERCIAL SOTA • GLOBAL RELEASE

The Pinnacle of Voice Biometric Security (Đỉnh cao Bảo mật Sinh trắc học Giọng nói)


BrightoSV V1.5 Special Edition is the Commercial SOTA (State-of-the-Art) voice anti-spoofing model, representing the culmination of BrighTO Technology's research in deepfake detection. This release delivers 34-42% improvement over V1.3 on the most challenging attack vectors while maintaining near-perfect real user acceptance.

(BrightoSV V1.5 Special Edition là model SOTA thương mại về CHỐNG GIẢ MẠO GIỌNG NÓI, đại diện cho đỉnh cao nghiên cứu của BrighTO Technology. Phiên bản này cải thiện 34-42% so với V1.3 trên các vector tấn công khó nhất.)

Engineered for Zero-Trust Banking, National Security, and Enterprise Authentication, V1.5 neutralizes threats from the latest GenAI voice synthesis, Deepfake splicing (LAV-DF), and sophisticated Neural Codec attacks.


🏆 Key Performance Indicators — V1.5 Special Edition

Bank-Grade Benchmarks (4-Second QA Gate)

Evaluated on 400,000+ open-set samples with strict QA gates (Audio ≥ 4s, SNR ≥ 10dB):

Metric (Chỉ số) Value Significance (Ý nghĩa)
EER (Standard) 0.22% 🏆 Commercial SOTA. Sub-0.25% on hardest in-the-wild benchmarks. (Tỷ lệ lỗi cân bằng đạt mức SOTA thương mại).
LAV-DF EER 0.22% 🛡️ Deepfake Splice Neutralized. Same error rate as standard attacks — no longer a vulnerability. (Tấn công ghép nối không còn là điểm yếu).
LAV-DF FRR @ FAR=0.01% 1.53% 🔒 Bank-Grade Security. Only ~1.5% retry rate at maximum security threshold. (Chỉ ~1.5% phải xác thực lại ở mức bảo mật cao nhất).
Telephony Real EER 0.06% Real User Excellence. Near-perfect acceptance for genuine users over phone. (Chấp nhận người dùng thật gần như hoàn hảo).
Clean TTS EER 0.22% 🛡️ Full Cleane TTS Coverage. Most deepfake splices are <4s — this gate captures all. (Bao phủ toàn bộ tấn công LAV-DF).
Clean TTS FRR @ FAR=0.01% 0.63% 👑 Extrem UX with extreme security. < 7/1,000 times users need retry at extreme threshold of 1/10,000 FAR. (7/1000 người dùng phải thử lại)
Accuracy 99.85% Uncompromising precision for mission-critical decisions. (Độ chính xác cao nhất cho quyết định quan trọng).
Latency < 50ms Real-time processing for streaming authentication. (Xử lý thời gian thực).

Extended Coverage Benchmarks (2-Second QA Gate)

For scenarios requiring shorter audio acceptance (mobile, call centers):

Metric (Chỉ số) Value Significance (Ý nghĩa)
EER (Standard) 0.17% 🏆 Best-in-Class. Industry-leading performance on 2s audio. (Dẫn đầu ngành với audio 2 giây).
LAV-DF EER 0.17% 🛡️ Full LAV-DF Coverage. Most deepfake splices are <4s — this gate captures all. (Bao phủ toàn bộ tấn công LAV-DF).
LAV-DF FRR @ FAR=0.01% 1.21% 🔒 Optimized Security. 38% improvement vs V1.3 at same threshold. (Cải thiện 38% so với V1.3).
Clean TTS EER 0.17% 🛡️ Full Cleane TTS Coverage. Most deepfake splices are <4s — this gate captures all. (Bao phủ toàn bộ tấn công LAV-DF).
Clean TTS FRR @ FAR=0.01% 0.31% 👑 Extrem UX with extreme security. ~3/1000 times users need retry at extreme threshold of 1/10,000 FAR. (3/1000 người dùng phải thử lại)
Telephony Real EER 0.04% Exceptional UX. < 1 in 2,500 legitimate users need retry. (Dưới 1/2500 người dùng phải thử lại).

💀 Adversary Elimination — The "Kill" Statistics

We define "Killed" as forcing the adversary's best possible samples into extreme negative score ranges, creating an unbridgeable Tail Gap between Real and Spoof distributions.

Threat Category Tail Gap (4s) Tail Gap (2s) Status Description
Clean TTS > 0.93 > 1.15 KILLED 💀 Modern AI voices (F5-TTS, XTTS, CosyVoice) are mathematically isolated. (Giọng AI hiện đại bị cô lập hoàn toàn).
LAV-DF (Deepfake Splice) > 0.11 > 0.15 KILLED 💀 42% wider margin than V1.3. Spliced deepfakes neutralized. (Biên an toàn rộng hơn 42% so với V1.3).
Vocoder (Neural Codec) > 1.92 > 1.90 KILLED 💀 Legacy and modern neural artifacts completely solved. (Hoàn toàn vô hiệu hóa công nghệ Vocoder).
Replay Attacks > 1.14 > 1.41 KILLED 💀 Physical replay attempts detected with extreme confidence. (Phát hiện tấn công phát lại với độ tin cậy cực cao).

Hardest 5% Tail Composition — Vulnerability Surface Analysis

The hardest 5% of spoof samples represents the model's true attack surface. V1.5 has eliminated systematic vulnerabilities:

Attack Type V1.2 (Baseline) V1.5 Special Edition Improvement
LAV-DF Dominance 66.5% 4.7% −61.8pp
Clean TTS+LAV Combined 88.8% 42.4% −46.4pp
Noisy (Noise Floor) 11.2% 57.6% Random noise, not exploitable

When random noise dominates the hardest tail, systematic attack patterns are solved. (Khi nhiễu ngẫu nhiên chiếm đa số trong nhóm khó nhất, các mẫu tấn công có hệ thống đã bị giải quyết.)


🆚 Competitive Benchmarking — V1.5 vs Industry

Benchmarked against top-tier academic architectures and commercial standards on 400,000+ open-set evaluation samples:

Model / Solution EER (4s) LAV-DF FRR@0.01% Tail Gap Short Audio (2s)
BrightoSV V1.5 SE 🇻🇳 0.22% 👑 1.53% 👑 > 0.11 👑 0.17% EER 👑
BrightoSV V1.3 0.28% 2.33% 0.078 0.22% EER
BrightoSV V1.2 0.12% 0.38%* > 0.20 0.91% EER
ASVspoof 2024 SOTA ~5.56% N/A Negative Collapses
ASVspoof 2021 DF ~20.0% N/A Overlap Very Low
Commercial APIs ~1.5-3.0% ~5-10% < 0.5 Medium

*V1.2 metrics on different eval protocol; V1.5 uses stricter bank-grade methodology.


🛡️ Threat Model Coverage

BrightoSV V1.5 is trained to detect 200+ unique spoofing attack types with Zero-Trust architecture:

🚫 Zero-Day Defense (Future-Proofing)

Unlike detectors that memorize known attacks, BrightoSV uses One-Class Learning enhanced with DPO (Direct Preference Optimization) to model the precise manifold of genuine human speech. Unknown future attacks are automatically rejected because they deviate from the "True Real" center.

(Khác với các detector ghi nhớ tấn công đã biết, BrightoSV sử dụng Học Một Lớp kết hợp DPO để mô hình hóa chính xác phân phối giọng nói thật. Các tấn công tương lai tự động bị chặn vì lệch khỏi "Tâm Thật".)

🧬 Logical Access Attacks (TTS & VC)

  • Diffusion/Flow Models: F5-TTS, CosyVoice, NaturalSpeech 2/3, Stable Audio
  • Neural Codecs: VALL-E, XTTS, Bark, Tortoise, Voicebox
  • Voice Conversion: RVC, So-VITS-SVC, OpenVoice, FreeVC

🔊 Physical Access Attacks

  • Replay Attacks: High-fidelity recordings via speakers, phones, professional equipment
  • Channel Manipulation: Codec artifacts, compression, transmission distortion

🧩 Partial Spoofing (Deepfake Injection) — V1.5 Specialty

  • LAV-DF (Localized Audio Visual DeepFake): Spliced/injected segments where only portions are manipulated
  • Boundary Detection: Micro-discontinuities at splice points identified via advanced temporal modeling

🎯 Production Deployment Configuration

Scoring Convention

BrightoSV outputs cosine similarity to the OC-Softmax center (inverted scale):

Score Range Meaning Interpretation
Near −1.0 High similarity to "True Real" center ✅ Genuine speech (Giọng thật)
Near +1.0 Far from "True Real" center ❌ Spoofed / Fake (Giọng giả)

Decision Logic: score < threshold → REAL · score ≥ threshold → SPOOF


🏦 Mode 1: bank_top_security — Maximum Security Banking

For: Large wire transfers (>$10,000), account changes, new device authorization, vault access. (Dành cho: Chuyển khoản lớn, thay đổi tài khoản, ủy quyền thiết bị mới.)

Parameter Value
Min Audio ≥ 4.0s (after silence trim)
Window / Stride 4s / 2s
Aggregation bottom quantile (q=0.1)
Min SNR ≥ 10.0 dB
Option Threshold Expected FRR LAV-DF Protection Use Case
A — Balanced −0.96 ~0.5% Excellent Production default (Mặc định sản xuất)
B — Maximum −0.98 ~1.5% Bank-grade High-value transactions (Giao dịch giá trị cao)
# bank_top_security config
{
    "min_audio_sec": 4.0,
    "win_sec": 4.0,
    "stride_sec": 2.0,
    "min_snr_db": 10.0,
    "threshold": -0.96,        # Option A (Balanced)
    # "threshold": -0.98,      # Option B (Maximum Security)
    "aggregation": "bottom",
    "quantile": 0.3, # 0.3 sắc hơn 0.1 (sharpened 0.3)
}

🏛️ Mode 2: bank_flex — Flexible Banking

For: Mobile banking login, eKYC onboarding, call center verification, standard transactions. (Dành cho: Đăng nhập ngân hàng di động, eKYC, xác minh tổng đài, giao dịch thông thường.)

Parameter Value
Min Audio ≥ 2.0s (after silence trim)
Window / Stride 2s / 1s
Aggregation bottom quantile (q=0.1)
Min SNR ≥ 10.0 dB
Option Threshold Expected FRR LAV-DF Protection Use Case
A — Balanced −0.95 ~0.35% Excellent Mobile banking, eKYC (Ngân hàng di động)
B — Secure −0.97 ~0.8% Enhanced FinTech compliance (Tuân thủ FinTech)
C — Maximum −0.98 ~1.2% Bank-grade Regulatory audit (Kiểm toán quy định)
# bank_flex config
{
    "min_audio_sec": 2.0,
    "win_sec": 2.0,
    "stride_sec": 1.0,
    "min_snr_db": 10.0,
    "threshold": -0.95,        # Option A (Balanced)
    # "threshold": -0.97,      # Option B (Secure)
    # "threshold": -0.98,      # Option C (Maximum)
    "aggregation": "bottom",
    "quantile": 0.3, # 0.3 sắc hơn 0.1 (sharpened 0.3)
}

📱 Mode 3: consumer — Consumer Applications

For: Smart home voice control, app login, voice assistants, general authentication. (Dành cho: Điều khiển nhà thông minh, đăng nhập ứng dụng, trợ lý giọng nói.)

Parameter Value
Min Audio ≥ 2.0s
Window / Stride 2s / 1s
Aggregation bottom quantile (q=0.1)
Min SNR ≥ 10.0 dB
Option Threshold Expected FRR Protection Use Case
A — UX Priority −0.90 ~0.1% Strong Seamless consumer UX (Trải nghiệm mượt mà)
B — Balanced −0.92 ~0.2% Very Strong Consumer + security (Cân bằng bảo mật)
C — Cautious −0.95 ~0.35% Excellent High-security consumer (Bảo mật cao)
# consumer config
{
    "min_audio_sec": 2.0,
    "win_sec": 2.0,
    "stride_sec": 1.0,
    "min_snr_db": 10.0,
    "threshold": -0.90,        # Option A (UX Priority)
    # "threshold": -0.92,      # Option B (Balanced)
    # "threshold": -0.95,      # Option C (Cautious)
    "aggregation": "bottom",
    "quantile": 0.3, # 0.3 sắc hơn 0.1 (sharpened 0.3)
}

🚨 QA Gate — Mandatory Pre-Check

All audio MUST pass QA before inference. Never pass low-quality audio to the model.

Check Requirement Action on Fail
Duration ≥ 2.0s (flex) or ≥ 4.0s (top_security) Reject → Prompt re-record
SNR ≥ 10.0 dB Reject → Prompt re-record

Without QA gate, FRR@0.01% increases 14x. QA is production necessity, not optional. (Không có QA gate, FRR@0.01% tăng 14 lần. QA là bắt buộc cho production.)


Threshold Selection Guide

Scenario Recommended Mode Recommended Option
Wire transfers > $10,000 bank_top_security B (Maximum)
Standard banking operations bank_top_security A (Balanced)
Mobile banking / eKYC bank_flex A (Balanced)
Call center verification bank_flex B (Secure)
FinTech regulatory compliance bank_flex C (Maximum)
Smart home / IoT consumer A (UX Priority)
App login (general) consumer B (Balanced)
Airport / National security bank_top_security B (Maximum)

🛡️ Training Methodology — The "Resilience" Engine

BrightoSV V1.5 is forged on a massive 15+ million sample dataset covering 200+ types of attacks, multiple languages, accents, and recording conditions. Our proprietary training methodology creates a truly resilient model through 4 technical pillars:

(BrightoSV V1.5 được rèn luyện trên tập dữ liệu khổng lồ 15+ triệu mẫu bao gồm hơn 200 loại tấn công, đa dạng ngôn ngữ và điều kiện thu âm. Mô hình đạt độ bền bỉ vượt trội nhờ 4 trụ cột kỹ thuật:)

1️⃣ Environmental & Acoustic Resilience (Kiên cường trước Môi trường & Âm học)

We simulate billions of real-world acoustic environments (Reverb, Room Impulse Response) and chaotic noise profiles. The model is trained to be robust up to 5dB SNR, ensuring it focuses on the source identity rather than background noise.

(Mô phỏng hàng tỷ môi trường thực tế để đảm bảo độ chính xác tới ngưỡng 5dB SNR, tập trung vào định danh người nói thay vì tiếng ồn nền.)

Category Examples Technical Goal
🐾 Animals Dog barking (chó sủa), Rooster crowing (gà gáy), Pig grunting (lợn kêu), Cat meowing (mèo gào) Resilience against sudden biological sounds
🏠 Home Ticking clocks (đồng hồ gõ), Vacuum cleaners (máy hút bụi), Washing machines (máy giặt rung) Reliability for WFH environments
🗣️ Human (Hardest) Coughing (ho), Sneezing (hắt hơi), Baby crying (trẻ khóc), Footsteps (tiếng chân) Disentangle target speaker from human artifacts
🏙️ Urban Street noise (phố xá), Sirens (còi cứu hỏa), Car horns (còi xe), Trains (tàu hỏa) On-the-go verification robustness
⛈️ Natural Heavy rain (mưa lớn), Wind gusts (gió giật), Thunder (sấm sét) Outdoor environment stability

2️⃣ Telephony & Channel Robustness (Bền bỉ với Đường truyền & Kênh thoại)

By simulating various transmission codecs (GSM, VoIP – Zalo/WhatsApp, MP3, AAC) and microphone distortions (RawBoost), the model learns to distinguish between benign compression artifacts and malicious deepfake artifacts. It won't fail just because the user has a poor connection.

(Mô phỏng đa dạng các chuẩn nén và nhiễu microphone để phân biệt rõ nhiễu mạng vô hại và dấu hiệu giả mạo độc hại. Hệ thống không sai sót chỉ vì kết nối mạng kém.)

3️⃣ Artifact Preservation Strategy — Asymmetric Learning (Chiến lược Bảo toàn Dấu vết - Học Bất đối xứng)

Unlike standard approaches that "over-clean" data, our pipeline uses a surgical approach:

  • REAL speech: Augmented heavily → builds robustness
  • FAKE speech: Treated delicately → preserves microscopic digital artifacts (vocoder buzz, phase discontinuities)

This ensures modern Neural Codec traces are never washed away during training.

(Khác với cách tiếp cận thông thường, chúng tôi tăng cường cực mạnh dữ liệu THẬT để tăng độ bền, nhưng xử lý nhẹ nhàng dữ liệu GIẢ để bảo toàn tuyệt đối các lỗi vi mô.)

4️⃣ Multilingual Generalization (Tổng quát hóa Đa ngôn ngữ)

Trained on a diverse multilingual corpus (Vietnamese, English, Chinese, German, French, Japanese, Arabic, Dutch...), the model focuses on universal spoofing traces rather than language-specific phonemes, offering global protection regardless of language or accent.

(Được huấn luyện trên kho ngữ liệu đa dạng, mô hình tập trung nhận diện các dấu hiệu giả mạo phổ quát thay vì phụ thuộc vào đặc điểm ngôn ngữ cụ thể, mang lại khả năng bảo vệ toàn cầu.)


⚙️ Technical Specifications

Specification Value
Model Version V1.5 Special Edition (Commercial SOTA)
Architecture Audio Self-Supervised Learning Base + GAT
Parameters 316M (High-Capacity Backbone)
Input Sample Rate 16kHz (Auto-resampling supported)
Input Formats WAV, MP3, OGG, FLAC, M4A
Min Duration 2.0s (flex) / 4.0s (top_security)
Min SNR 10.0 dB
Output Cosine similarity score [−1.0, +1.0]

🚀 Hardware & Performance

Specification Value
GPU Support NVIDIA RTX 1080+, A10G, A100, H100, L4, T4
CPU Support Intel Xeon, AMD EPYC (via ONNX/OpenVINO)
Inference Latency < 50ms (RTX 3090)
Model Size ~1.2 GB
Batch Processing Supported for enterprise throughput
Concurrent Requests High-density deployment ready

🌍 Application Scenarios

Optimized for sectors requiring maximum security:

Sector Use Case Recommended Mode
🏦 Banking & Finance Wire transfers, Voice Banking bank_top_security
🆔 eKYC Customer onboarding, Liveness bank_flex
🪙 Crypto & FinTech Wallet protection, Transaction auth bank_top_security
✈️ National Security Border control, Immigration bank_top_security
🚓 Corrections Inmate communication monitoring bank_flex
🎧 Call Centers Real-time deepfake detection bank_flex
📱 Consumer Apps Voice login, Smart home consumer

📈 Version History

Version Release Highlight Status
V1.5 SE Feb 2026 Commercial SOTA — 34-42% LAV-DF improvement 🟢 Current
V1.3 Jan 2026 LAV-DF breakthrough (66% → 5% tail) Production
V1.2 Dec 2025 Bank-grade baseline Legacy

🔒 Security & Compliance

Aspect Implementation
Deployment On-premise (Docker/K8s) or Private Cloud
Data Privacy Zero-retention — Audio processed in RAM only
Encryption End-to-end TLS 1.3
Compliance GDPR, PCI-DSS, SOC 2 ready
Audit Trail Full logging capability (no audio stored)

📞 Access & Licensing

This model is Private and available exclusively for enterprise partners under NDA. (Model nội bộ, chỉ cung cấp cho đối tác Doanh nghiệp ký NDA.)

Thương mại & Triển khai

  • License trọn gói hoặc qua API
  • Hỗ trợ tích hợp theo yêu cầu (triển khai, tối ưu hiệu năng, giám sát chất lượng)
  • Công ty Cổ phần SphinX (sphinxjsc.com) được giao quyền đóng gói, cung cấp API và phân phối

Bản quyền & License

Thương mại / Proprietary. Việc sử dụng, phân phối lại hoặc tạo bản phái sinh cần có chấp thuận bằng văn bản từ BrighTO Technology.

Liên hệ

Purpose Contact
Commercial Licensing nguyen@brighto.ai, nghia@brighto.ai
API & Distribution duc@sphinxjsc.com (SphinX JSC)
Technical Inquiries nguyen@hatto.com

🏆 BrightoSV V1.5 Special Edition

Commercial SOTA • Global Ready • Bank-Grade Security

Built in Vietnam 🇻🇳 • Engineered for the World 🌏


This model card refers to BrightoSV Anti-Spoof V1.5 Special Edition (Commercial SOTA Release). All benchmark results are verified on internal test sets comprising 400,000+ open-set samples with strict bank-grade QA methodology.

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Evaluation results