Pentest Vulnerability Detector

Model Description

This is a fine-tuned version of DeepSeek-Coder-1.3B-Instruct, specialized for detecting security vulnerabilities in code.

Base Model: deepseek-ai/deepseek-coder-1.3b-instruct
Training Data: 440 synthetic vulnerability examples
Training Method: LoRA (Low-Rank Adaptation) with 4-bit quantization
Training Platform: Google Colab (Free T4 GPU)

Capabilities

The model can detect and analyze:

  • SQL Injection
  • Cross-Site Scripting (XSS)
  • Command Injection / RCE
  • Insecure Direct Object Reference (IDOR)
  • Server-Side Request Forgery (SSRF)
  • Authentication Bypass
  • Cross-Site Request Forgery (CSRF)
  • Path Traversal

Training Details

  • Examples: 440 vulnerability patterns
  • Epochs: 3
  • Batch Size: 2 (with gradient accumulation)
  • Learning Rate: 2e-4
  • LoRA Rank: 8
  • Quantization: 4-bit (NF4)
  • Training Time: ~45-60 minutes on T4 GPU

Usage

from transformers import AutoModelForCausalLM, AutoTokenizer
from peft import PeftModel

# Load base model
base_model = "deepseek-ai/deepseek-coder-1.3b-instruct"
model = AutoModelForCausalLM.from_pretrained(base_model, device_map="auto")
tokenizer = AutoTokenizer.from_pretrained(base_model)

# Load LoRA adapter
model = PeftModel.from_pretrained(model, "YOUR_USERNAME/pentest-vulnerability-detector")

# Analyze code
code = "SELECT * FROM users WHERE id = 'user_input'"
prompt = f"System: You are a security expert.\n\nUser: Analyze this code:\n{code}\n\nAssistant:"

inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
outputs = model.generate(**inputs, max_new_tokens=200)
response = tokenizer.decode(outputs[0], skip_special_tokens=True)
print(response)

Inference Script

For easier usage, use the provided inference script:

python inference_deepseek.py --model ./model --code "YOUR_CODE_HERE"

Model Performance

The model provides:

  • Vulnerability type identification
  • Severity assessment (CRITICAL/HIGH/MEDIUM/LOW)
  • Detailed attack vector analysis
  • Specific remediation recommendations
  • Code-specific security guidance

Limitations

  • Not 100% accurate - always verify findings manually
  • May have false positives/negatives
  • Best used as a pre-screening tool
  • Should complement, not replace, manual security testing
  • Trained on synthetic data - may need fine-tuning for specific use cases

Ethical Use

This model is intended for:

  • Security research
  • Penetration testing (authorized only)
  • Code review and security auditing
  • Educational purposes

Do not use for:

  • Unauthorized system access
  • Malicious activities
  • Illegal purposes

Training Data

The model was trained on 440 synthetic vulnerability examples covering:

  • 100 SQL Injection patterns
  • 80 XSS patterns
  • 60 Command Injection patterns
  • 50 IDOR patterns
  • 40 SSRF patterns
  • 40 Authentication Bypass patterns
  • 40 CSRF patterns
  • 30 Path Traversal patterns

Citation

If you use this model, please cite:

@misc{pentest-vulnerability-detector,
  author = {YOUR_NAME},
  title = {Pentest Vulnerability Detector},
  year = {2025},
  publisher = {Hugging Face},
  howpublished = {\url{https://huggingface.co/YOUR_USERNAME/pentest-vulnerability-detector}}
}

License

This model adapter is released under the Apache 2.0 License.

The base model (DeepSeek-Coder-1.3B-Instruct) has its own license terms.

Apache 2.0 License Summary:

  • ✅ Commercial use allowed
  • ✅ Modification allowed
  • ✅ Distribution allowed
  • ✅ Patent use allowed
  • ⚠️ Must include license and copyright notice
  • ⚠️ Must state changes made

See LICENSE file for full terms.

Contact

For questions or issues, please open an issue on the model repository.

Acknowledgments

  • Base model: DeepSeek-Coder by DeepSeek AI
  • Training framework: Hugging Face Transformers, PEFT
  • Training platform: Google Colab
Downloads last month
114
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for elsiddik/pentest-vulnerability-detector

Adapter
(43)
this model