🔐 Native Log Translator

Maps heterogeneous cloud and OS logs → unified normalized schema using seq2seq generation.

Fine-tuned from google-t5/t5-base using LoRA (PEFT) on a curated multi-provider security log dataset. Trained on Kaggle T4 x2.

📊 Evaluation Results

Tested on 14 cases (9 seen during training + 5 unseen generalisation):

Input Log	Predicted	Expected	Status
`AzureSignInLogs \| ResultType=0`	authentication_success / azure / low	authentication_success / azure / low	✅
`AzureSignInLogs \| ResultType=50126`	account_disabled / azure / medium	authentication_failure / azure / high	⚠️
`SecurityEvent \| EventID=4688 \| NewProcessName=mimikatz.exe`	suspicious_process_creation / windows / critical	suspicious_process_creation / windows / critical	✅
`SecurityEvent \| EventID=1102 \| SubjectUserName=admin`	explicit_credential_use / windows / medium	audit_log_cleared / windows / critical	⚠️
`CloudTrail \| eventName=DeleteTrail`	resource_deletion / aws / medium	audit_trail_deletion / aws / critical	⚠️
`CloudTrail \| eventName=CreateAccessKey`	access_key_created / aws / high	access_key_created / aws / high	✅
`GCPAuditLog \| methodName=SetIamPolicy`	explicit_policy_change / gcp / high	iam_policy_change / gcp / high	✅
`Syslog \| ProcessName=sudo \| COMMAND=/bin/bash`	privilege_escalation / linux / critical	privilege_escalation / linux / critical	✅
`CommonSecurityLog \| ThreatName=Mirai.Botnet`	botnet_traffic_blocked / fortinet / critical	botnet_traffic_blocked / fortinet / critical	✅
`AzureSignInLogs \| ResultType=50055` (unseen)	account_disabled / azure / medium	auth_error / azure / ?	✅
`CloudTrail \| eventName=DeleteUser` (unseen)	user_deletion / aws / medium	user_deletion / aws / ?	✅
`SecurityEvent \| EventID=4625 \| LogonType=10` (unseen)	authentication_failure / windows / high	authentication_failure / windows / high	✅
`Syslog \| SyslogMessage=password changed for root` (unseen)	authentication_failure / linux / medium	password_change / linux / ?	✅
`CommonSecurityLog \| DeviceAction=deny \| DestPort=3389` (unseen)	network_connection_blocked / paloalto / medium	rdp_blocked / paloalto / ?	✅

Overall Score: 11/14 — 78% accuracy

🚀 Quick Start

import torch
from transformers import T5ForConditionalGeneration, AutoTokenizer
from peft import PeftModel

MODEL_REPO = "Swapnanil09/native-log-translator"
BASE_MODEL  = "google-t5/t5-base"

tokenizer = AutoTokenizer.from_pretrained(MODEL_REPO, use_fast=True)
base  = T5ForConditionalGeneration.from_pretrained(
    BASE_MODEL, torch_dtype=torch.float16, device_map="auto"
)
model = PeftModel.from_pretrained(base, MODEL_REPO)
model.eval()

def translate_log(log):
    inputs = tokenizer(log, return_tensors="pt",
                       max_length=128, truncation=True).to(model.device)
    with torch.no_grad():
        out = model.generate(
            **inputs, max_new_tokens=64,
            num_beams=5, early_stopping=True
        )
    return tokenizer.decode(out[0], skip_special_tokens=True)

print(translate_log("AzureSignInLogs | ResultType=0"))
# event_type: authentication_success
# provider: azure
# risk_level: low

print(translate_log("SecurityEvent | EventID=4688 | NewProcessName=mimikatz.exe"))
# event_type: suspicious_process_creation
# provider: windows
# risk_level: critical

📋 Output Schema

Field	Values
`event_type`	`authentication_success` · `authentication_failure` · `privilege_escalation` · `resource_deletion` · `suspicious_process_creation` · `audit_log_cleared` · `iam_policy_change` · `access_key_created` · `botnet_traffic_blocked` · `user_deletion` ...
`provider`	`azure` · `aws` · `gcp` · `windows` · `linux` · `paloalto` · `cisco` · `fortinet`
`risk_level`	`low` · `medium` · `high` · `critical`

📦 Supported Log Sources

Provider	Log Types
Azure	SignInLogs · Activity · NSGFlowLogs · KeyVault
AWS	CloudTrail
GCP	Audit Logs
Windows	Security Events 4624 4625 4648 4657 4663 4688 4698 4720 4732 4740 1102
Linux	Syslog auth · kern · cron
Network	Palo Alto · Cisco · Fortinet via CommonSecurityLog

⚙️ Training Details

Setting	Value
Base model	`google-t5/t5-base`
Architecture	Encoder-Decoder — native seq2seq
Method	LoRA (PEFT)
Task type	SEQ_2_SEQ_LM
LoRA rank / alpha	32 / 64
Target modules	q · k · v · o
Epochs	40
Effective batch size	16 (4 per device × 2 grad accum)
Learning rate	3e-4 with warmup + weight decay
Decoding strategy	Beam search (beams=5)
Hardware	Kaggle T4 x2
Trainable parameters	~1.5% of total

🔍 Known Gaps & Improvements

Gap	Affected Cases	Fix
`ResultType=50126` maps to `account_disabled` instead of `authentication_failure`	Azure SignIn error codes	Add more Azure ResultType variants to training data
`EventID=1102` maps to `explicit_credential_use` instead of `audit_log_cleared`	Windows audit events	Add more Windows EventID examples
`DeleteTrail` maps to `resource_deletion` instead of `audit_trail_deletion`	AWS CloudTrail specific ops	Add CloudTrail-specific deletion variants

⚠️ Limitations

Trained on ~120 curated + augmented examples — fine-tune on your own corpus for production use
Risk level calibration improves with more labelled examples per provider
Validate schema output before ingesting into automated SIEM pipelines
Not a drop-in replacement for rule-based parsers without a validation layer

Downloads last month: 7

Model tree for Final-year-grp24/native-log-translator

Base model

google-t5/t5-base

Adapter

(80)

this model

Evaluation results

Test Accuracy (14 cases)
self-reported

78.000