Model Card for LogLaye-LLaMA3.2B-QLoRA-finetuned-HDFS-logs

Model Details

Model Description

This model is a fine-tuned version of LLaMA 3.2B using QLoRA for log anomaly detection on HDFS (Hadoop Distributed File System) logs.

Developed by: Abdoulaye MBAYE (FatLab, Fathala IT, ZigZeug)
Finetuned from model: meta-llama/LLaMA-3.2B
Language(s): English (Log data text)
License: llama3 original license (non-commercial research only)
Model type: Causal Language Model (Instruction-tuned for classification)

Model Sources

Repository: https://huggingface.co/ZigZeug/LogLaye-LLaMA3.2B-QLoRA-finetuned-HDFS-logs
Dataset: https://huggingface.co/datasets/ZigZeug/HDFS-logs-cleaned-chatml

Uses

Direct Use

This model classifies HDFS logs into:

"normal" → Expected system behavior.
"anomalous" → Suspicious or error-prone system behavior.

Downstream Use

Infrastructure log monitoring
Automated ML-based observability
Large scale system supervision

Out-of-Scope Use

Not designed for non-HDFS log formats.
Not suitable for general-purpose natural language tasks.

Bias, Risks, and Limitations

The model was trained on pre-processed HDFS logs; unknown behavior may occur with logs from different systems.
The model doesn’t explain why an anomaly happens — only predicts classification.

Recommendations

Always keep human supervision when deploying anomaly detection models in production.

How to Get Started with the Model

from transformers import AutoTokenizer, AutoModelForCausalLM

model_id = "ZigZeug/LogLaye-LLaMA3.2B-QLoRA-finetuned-HDFS-logs"

model = AutoModelForCausalLM.from_pretrained(model_id, device_map="auto", torch_dtype="auto")
tokenizer = AutoTokenizer.from_pretrained(model_id)

prompt = [
  {"role": "system", "content": "You are an expert in HDFS log analysis. Classify if the following log is normal or anomalous."},
  {"role": "user", "content": "Log: PacketResponder 2 for block blk_-3552845605773916309 terminating"}
]

chat = tokenizer.apply_chat_template(prompt, tokenize=False, add_generation_prompt=True)
inputs = tokenizer(chat, return_tensors="pt").to("cuda")
outputs = model.generate(**inputs, max_new_tokens=20)

print(tokenizer.decode(outputs[0], skip_special_tokens=True))