Cybersecurity NER Model

NER model for cybersecurity domain. F1: 98.31%.

Model Details

Version: v5 Framework: spaCy 3.8+ Training Date: 2025-12-29 Examples: 1922 (stratified 80/10/10) Backbone: Domain-adapted RoBERTa

Entities (13)

Entity F1 Examples
CERTIFICATION 100% CISSP, OSCP, CEH
SECURITY_ROLE 100% CISO, SOC Analyst
SECURITY_TOOL 100% Splunk, Metasploit
ATTACK_TECHNIQUE 100% SQL Injection, XSS
FRAMEWORK 100% NIST CSF, ISO 27001
THREAT_TYPE 100% APT, ransomware
AUDIT_TERM 100% Compliance, Audit
CVE 100% CVE-2021-44228
SECURITY_DOMAIN 99.10% Cloud Security
TECHNICAL_SKILL 95.30% Incident Response
REGULATION 94.44% GDPR, HIPAA
ACRONYM 88.89% SIEM, EDR
CONTROL_ID 0% See hybrid approach

Performance

Metrics:

  • F1: 98.31%
  • Precision: 97.92%
  • Recall: 98.69%
  • Inference: ~60ms/doc

v5 changes from v4:

  • Tuned hyperparameters (dropout 0.25, L2 0.02)
  • Improved REGULATION (+6.64pp), ACRONYM (+22.22pp)
  • Overall +0.25pp F1

CONTROL_ID Handling

Model F1 for CONTROL_ID: 0% (insufficient training data: 25 examples).

Solution: Hybrid approach - regex extraction for production use.

Patterns: ISO 27001, NIST CSF, CIS Controls, SOC 2, PCI-DSS.

See service implementation for details.

Usage

pip install spacy>=3.7.0 spacy-transformers>=1.3.0
import spacy

nlp = spacy.load("pki/ner-cybersecurity")
doc = nlp("CISO with CISSP, expert in Splunk and ISO 27001")

for ent in doc.ents:
    print(f"{ent.text:20} | {ent.label_}")

Output:

CISO                 | SECURITY_ROLE
CISSP                | CERTIFICATION
Splunk               | SECURITY_TOOL
ISO 27001            | FRAMEWORK

Use Cases

  • Job/CV matching
  • Threat intelligence extraction
  • Compliance documentation parsing
  • Security policy analysis

Training Config

max_steps = 8000
dropout = 0.25
L2 = 0.02
learning_rate = 0.00003
hidden_width = 128
maxout_pieces = 3
batch_size = 128

Limitations

  • ACRONYM: Lower F1 (88.89%) - limited examples (46)
  • CONTROL_ID: Requires hybrid regex approach
  • Domain-specific: Optimized for cybersecurity text
  • Context-dependent ambiguity on some terms

License

MIT

Version History

Version Date F1 Examples Notes
v5 2025-12-29 98.31% 1922 Hyperparameter tuning
v4 2025-12-29 98.06% 1922 Stratified split, domain RoBERTa
v3 2025-01 69.4% 1000 spaCy 3.x migration
v2 2024-12 99.5%* 1805 spaCy 2.x (*train accuracy)

Contact

Issues: Model repository

Downloads last month
-
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Evaluation results