π Native Log Translator
Maps heterogeneous cloud and OS logs β unified normalized schema using seq2seq generation.
Fine-tuned from google-t5/t5-base using LoRA (PEFT) on a curated multi-provider
security log dataset. Trained on Kaggle T4 x2.
π Evaluation Results
Tested on 14 cases (9 seen during training + 5 unseen generalisation):
| Input Log | Predicted | Expected | Status |
|---|---|---|---|
AzureSignInLogs | ResultType=0 |
authentication_success / azure / low | authentication_success / azure / low | β |
AzureSignInLogs | ResultType=50126 |
account_disabled / azure / medium | authentication_failure / azure / high | β οΈ |
SecurityEvent | EventID=4688 | NewProcessName=mimikatz.exe |
suspicious_process_creation / windows / critical | suspicious_process_creation / windows / critical | β |
SecurityEvent | EventID=1102 | SubjectUserName=admin |
explicit_credential_use / windows / medium | audit_log_cleared / windows / critical | β οΈ |
CloudTrail | eventName=DeleteTrail |
resource_deletion / aws / medium | audit_trail_deletion / aws / critical | β οΈ |
CloudTrail | eventName=CreateAccessKey |
access_key_created / aws / high | access_key_created / aws / high | β |
GCPAuditLog | methodName=SetIamPolicy |
explicit_policy_change / gcp / high | iam_policy_change / gcp / high | β |
Syslog | ProcessName=sudo | COMMAND=/bin/bash |
privilege_escalation / linux / critical | privilege_escalation / linux / critical | β |
CommonSecurityLog | ThreatName=Mirai.Botnet |
botnet_traffic_blocked / fortinet / critical | botnet_traffic_blocked / fortinet / critical | β |
AzureSignInLogs | ResultType=50055 (unseen) |
account_disabled / azure / medium | auth_error / azure / ? | β |
CloudTrail | eventName=DeleteUser (unseen) |
user_deletion / aws / medium | user_deletion / aws / ? | β |
SecurityEvent | EventID=4625 | LogonType=10 (unseen) |
authentication_failure / windows / high | authentication_failure / windows / high | β |
Syslog | SyslogMessage=password changed for root (unseen) |
authentication_failure / linux / medium | password_change / linux / ? | β |
CommonSecurityLog | DeviceAction=deny | DestPort=3389 (unseen) |
network_connection_blocked / paloalto / medium | rdp_blocked / paloalto / ? | β |
Overall Score: 11/14 β 78% accuracy
π Quick Start
import torch
from transformers import T5ForConditionalGeneration, AutoTokenizer
from peft import PeftModel
MODEL_REPO = "Swapnanil09/native-log-translator"
BASE_MODEL = "google-t5/t5-base"
tokenizer = AutoTokenizer.from_pretrained(MODEL_REPO, use_fast=True)
base = T5ForConditionalGeneration.from_pretrained(
BASE_MODEL, torch_dtype=torch.float16, device_map="auto"
)
model = PeftModel.from_pretrained(base, MODEL_REPO)
model.eval()
def translate_log(log):
inputs = tokenizer(log, return_tensors="pt",
max_length=128, truncation=True).to(model.device)
with torch.no_grad():
out = model.generate(
**inputs, max_new_tokens=64,
num_beams=5, early_stopping=True
)
return tokenizer.decode(out[0], skip_special_tokens=True)
print(translate_log("AzureSignInLogs | ResultType=0"))
# event_type: authentication_success
# provider: azure
# risk_level: low
print(translate_log("SecurityEvent | EventID=4688 | NewProcessName=mimikatz.exe"))
# event_type: suspicious_process_creation
# provider: windows
# risk_level: critical
π Output Schema
| Field | Values |
|---|---|
event_type |
authentication_success Β· authentication_failure Β· privilege_escalation Β· resource_deletion Β· suspicious_process_creation Β· audit_log_cleared Β· iam_policy_change Β· access_key_created Β· botnet_traffic_blocked Β· user_deletion ... |
provider |
azure Β· aws Β· gcp Β· windows Β· linux Β· paloalto Β· cisco Β· fortinet |
risk_level |
low Β· medium Β· high Β· critical |
π¦ Supported Log Sources
| Provider | Log Types |
|---|---|
| Azure | SignInLogs Β· Activity Β· NSGFlowLogs Β· KeyVault |
| AWS | CloudTrail |
| GCP | Audit Logs |
| Windows | Security Events 4624 4625 4648 4657 4663 4688 4698 4720 4732 4740 1102 |
| Linux | Syslog auth Β· kern Β· cron |
| Network | Palo Alto Β· Cisco Β· Fortinet via CommonSecurityLog |
βοΈ Training Details
| Setting | Value |
|---|---|
| Base model | google-t5/t5-base |
| Architecture | Encoder-Decoder β native seq2seq |
| Method | LoRA (PEFT) |
| Task type | SEQ_2_SEQ_LM |
| LoRA rank / alpha | 32 / 64 |
| Target modules | q Β· k Β· v Β· o |
| Epochs | 40 |
| Effective batch size | 16 (4 per device Γ 2 grad accum) |
| Learning rate | 3e-4 with warmup + weight decay |
| Decoding strategy | Beam search (beams=5) |
| Hardware | Kaggle T4 x2 |
| Trainable parameters | ~1.5% of total |
π Known Gaps & Improvements
| Gap | Affected Cases | Fix |
|---|---|---|
ResultType=50126 maps to account_disabled instead of authentication_failure |
Azure SignIn error codes | Add more Azure ResultType variants to training data |
EventID=1102 maps to explicit_credential_use instead of audit_log_cleared |
Windows audit events | Add more Windows EventID examples |
DeleteTrail maps to resource_deletion instead of audit_trail_deletion |
AWS CloudTrail specific ops | Add CloudTrail-specific deletion variants |
β οΈ Limitations
- Trained on ~120 curated + augmented examples β fine-tune on your own corpus for production use
- Risk level calibration improves with more labelled examples per provider
- Validate schema output before ingesting into automated SIEM pipelines
- Not a drop-in replacement for rule-based parsers without a validation layer
- Downloads last month
- 7
Model tree for Final-year-grp24/native-log-translator
Base model
google-t5/t5-baseEvaluation results
- Test Accuracy (14 cases)self-reported78.000