๐Ÿ›ก๏ธ Model Card: DDoS Detection using XGBoost (ONNX)

A high-performance model to detect DDoS attacks from network traffic flow data. Trained on CIC-DDoS2019, optimized with Optuna, and exported to ONNX for fast, portable inference.


Model Details

Model Description

  • Developed by: Sunny Thakur
  • Model type: Gradient Boosted Tree (XGBoost)
  • Language(s): Not NLP-specific; flow data in numeric format
  • License: MIT
  • Finetuned from model: None (Trained from scratch)

Model Sources


Uses

Direct Use

  • Detect DDoS attacks from structured flow data (CSV, Parquet, JSONL after transformation)
  • Ideal for cybersecurity monitoring systems, SOC pipelines, or SIEM integrations

Downstream Use

  • Can be integrated in larger threat detection systems
  • Extended to multi-class detection or traffic categorization

Out-of-Scope Use

  • Real-time packet-level classification without flow aggregation
  • NLP, audio, or image data tasks

Bias, Risks, and Limitations

  • Model may overfit synthetic DDoS traffic patterns
  • Limited to features available in CIC-DDoS2019
  • SMOTE oversampling may create synthetic minority patterns that don't generalize

Recommendations

  • Validate on real-world or updated datasets before deployment
  • Periodic retraining recommended as attack patterns evolve

How to Get Started with the Model

ONNX Inference (Python)

import onnxruntime as ort
import numpy as np

session = ort.InferenceSession("ddos_model.onnx")
input_data = np.array([...], dtype=np.float32).reshape(1, -1)
outputs = session.run(None, {"input": input_data})

import joblib
pipeline = joblib.load("ddos_detection_pipeline.pkl")
prediction = pipeline.predict(input_df)

Training Details Training Data

Dataset: CIC-DDoS2019

Source: https://www.kaggle.com/datasets/dhoogla/cicddos2019

Classes: Binary (Benign vs DDoS)

Training Procedure

Preprocessed: IPs, ports, timestamps dropped

Feature engineered: requests_per_sec, pkt_len_variation

Balancing: SMOTE (30% oversample minority)

Scaler: StandardScaler

Model: XGBoost

Optimized using Optuna (F1-score, 30 trials)

Training Hyperparameters

n_estimators: 100โ€“500 (tuned)

max_depth: 3โ€“12 (tuned)

learning_rate: 0.001โ€“0.2

gamma, colsample_bytree, scale_pos_weight: tuned

tree_method: hist

early_stopping_rounds: 20

Evaluation Testing Data

20% hold-out split from full data

Stratified on class label
Metric Value
Accuracy 99.98%
F1-Score 99.98%
AUC-PR 1.000
Precision ~1.00
Recall ~1.00

Model Examination Explainability

Integrated SHAP

Summary plots identify dominant flow-level features

Environmental Impact

Hardware: Kaggle Tesla T4 or P100

Training Time: < 1 hour

Carbon Emitted: Low (can be estimated via ML CO2 Impact)

Technical Specifications

Architecture: Gradient Boosted Trees (XGBoost)

Format: ONNX + .pkl pipeline

Dependencies: XGBoost, ONNX, Scikit-learn, Optuna, SHAP

Citation

@misc{ddos-detection-xgboost-007,
  author = {Sunny Thakur (007)},
  title = {DDoS Detection Model - CICDDoS2019 - XGBoost + ONNX},
  year = {2025},
  url = https://huggingface.co/darkknight25/ddos_xgboost_onnx
}

Model Card Authors

Sunny Thakur 

Contact

GitHub: SunnyThakur25

LinkedIn: Sunny thakur
Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support