πŸ” Native Log Translator

Maps heterogeneous cloud and OS logs β†’ unified normalized schema using seq2seq generation.

Fine-tuned from google-t5/t5-base using LoRA (PEFT) on a curated multi-provider security log dataset. Trained on Kaggle T4 x2.


πŸ“Š Evaluation Results

Tested on 14 cases (9 seen during training + 5 unseen generalisation):

Input Log Predicted Expected Status
AzureSignInLogs | ResultType=0 authentication_success / azure / low authentication_success / azure / low βœ…
AzureSignInLogs | ResultType=50126 account_disabled / azure / medium authentication_failure / azure / high ⚠️
SecurityEvent | EventID=4688 | NewProcessName=mimikatz.exe suspicious_process_creation / windows / critical suspicious_process_creation / windows / critical βœ…
SecurityEvent | EventID=1102 | SubjectUserName=admin explicit_credential_use / windows / medium audit_log_cleared / windows / critical ⚠️
CloudTrail | eventName=DeleteTrail resource_deletion / aws / medium audit_trail_deletion / aws / critical ⚠️
CloudTrail | eventName=CreateAccessKey access_key_created / aws / high access_key_created / aws / high βœ…
GCPAuditLog | methodName=SetIamPolicy explicit_policy_change / gcp / high iam_policy_change / gcp / high βœ…
Syslog | ProcessName=sudo | COMMAND=/bin/bash privilege_escalation / linux / critical privilege_escalation / linux / critical βœ…
CommonSecurityLog | ThreatName=Mirai.Botnet botnet_traffic_blocked / fortinet / critical botnet_traffic_blocked / fortinet / critical βœ…
AzureSignInLogs | ResultType=50055 (unseen) account_disabled / azure / medium auth_error / azure / ? βœ…
CloudTrail | eventName=DeleteUser (unseen) user_deletion / aws / medium user_deletion / aws / ? βœ…
SecurityEvent | EventID=4625 | LogonType=10 (unseen) authentication_failure / windows / high authentication_failure / windows / high βœ…
Syslog | SyslogMessage=password changed for root (unseen) authentication_failure / linux / medium password_change / linux / ? βœ…
CommonSecurityLog | DeviceAction=deny | DestPort=3389 (unseen) network_connection_blocked / paloalto / medium rdp_blocked / paloalto / ? βœ…

Overall Score: 11/14 β€” 78% accuracy


πŸš€ Quick Start

import torch
from transformers import T5ForConditionalGeneration, AutoTokenizer
from peft import PeftModel

MODEL_REPO = "Swapnanil09/native-log-translator"
BASE_MODEL  = "google-t5/t5-base"

tokenizer = AutoTokenizer.from_pretrained(MODEL_REPO, use_fast=True)
base  = T5ForConditionalGeneration.from_pretrained(
    BASE_MODEL, torch_dtype=torch.float16, device_map="auto"
)
model = PeftModel.from_pretrained(base, MODEL_REPO)
model.eval()

def translate_log(log):
    inputs = tokenizer(log, return_tensors="pt",
                       max_length=128, truncation=True).to(model.device)
    with torch.no_grad():
        out = model.generate(
            **inputs, max_new_tokens=64,
            num_beams=5, early_stopping=True
        )
    return tokenizer.decode(out[0], skip_special_tokens=True)

print(translate_log("AzureSignInLogs | ResultType=0"))
# event_type: authentication_success
# provider: azure
# risk_level: low

print(translate_log("SecurityEvent | EventID=4688 | NewProcessName=mimikatz.exe"))
# event_type: suspicious_process_creation
# provider: windows
# risk_level: critical

πŸ“‹ Output Schema

Field Values
event_type authentication_success Β· authentication_failure Β· privilege_escalation Β· resource_deletion Β· suspicious_process_creation Β· audit_log_cleared Β· iam_policy_change Β· access_key_created Β· botnet_traffic_blocked Β· user_deletion ...
provider azure Β· aws Β· gcp Β· windows Β· linux Β· paloalto Β· cisco Β· fortinet
risk_level low Β· medium Β· high Β· critical

πŸ“¦ Supported Log Sources

Provider Log Types
Azure SignInLogs Β· Activity Β· NSGFlowLogs Β· KeyVault
AWS CloudTrail
GCP Audit Logs
Windows Security Events 4624 4625 4648 4657 4663 4688 4698 4720 4732 4740 1102
Linux Syslog auth Β· kern Β· cron
Network Palo Alto Β· Cisco Β· Fortinet via CommonSecurityLog

βš™οΈ Training Details

Setting Value
Base model google-t5/t5-base
Architecture Encoder-Decoder β€” native seq2seq
Method LoRA (PEFT)
Task type SEQ_2_SEQ_LM
LoRA rank / alpha 32 / 64
Target modules q Β· k Β· v Β· o
Epochs 40
Effective batch size 16 (4 per device Γ— 2 grad accum)
Learning rate 3e-4 with warmup + weight decay
Decoding strategy Beam search (beams=5)
Hardware Kaggle T4 x2
Trainable parameters ~1.5% of total

πŸ” Known Gaps & Improvements

Gap Affected Cases Fix
ResultType=50126 maps to account_disabled instead of authentication_failure Azure SignIn error codes Add more Azure ResultType variants to training data
EventID=1102 maps to explicit_credential_use instead of audit_log_cleared Windows audit events Add more Windows EventID examples
DeleteTrail maps to resource_deletion instead of audit_trail_deletion AWS CloudTrail specific ops Add CloudTrail-specific deletion variants

⚠️ Limitations

  • Trained on ~120 curated + augmented examples β€” fine-tune on your own corpus for production use
  • Risk level calibration improves with more labelled examples per provider
  • Validate schema output before ingesting into automated SIEM pipelines
  • Not a drop-in replacement for rule-based parsers without a validation layer
Downloads last month
7
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Model tree for Final-year-grp24/native-log-translator

Base model

google-t5/t5-base
Adapter
(80)
this model

Evaluation results