Pentest Vulnerability Detector
Model Description
This is a fine-tuned version of DeepSeek-Coder-1.3B-Instruct, specialized for detecting security vulnerabilities in code.
Base Model: deepseek-ai/deepseek-coder-1.3b-instruct
Training Data: 440 synthetic vulnerability examples
Training Method: LoRA (Low-Rank Adaptation) with 4-bit quantization
Training Platform: Google Colab (Free T4 GPU)
Capabilities
The model can detect and analyze:
- SQL Injection
- Cross-Site Scripting (XSS)
- Command Injection / RCE
- Insecure Direct Object Reference (IDOR)
- Server-Side Request Forgery (SSRF)
- Authentication Bypass
- Cross-Site Request Forgery (CSRF)
- Path Traversal
Training Details
- Examples: 440 vulnerability patterns
- Epochs: 3
- Batch Size: 2 (with gradient accumulation)
- Learning Rate: 2e-4
- LoRA Rank: 8
- Quantization: 4-bit (NF4)
- Training Time: ~45-60 minutes on T4 GPU
Usage
from transformers import AutoModelForCausalLM, AutoTokenizer
from peft import PeftModel
# Load base model
base_model = "deepseek-ai/deepseek-coder-1.3b-instruct"
model = AutoModelForCausalLM.from_pretrained(base_model, device_map="auto")
tokenizer = AutoTokenizer.from_pretrained(base_model)
# Load LoRA adapter
model = PeftModel.from_pretrained(model, "YOUR_USERNAME/pentest-vulnerability-detector")
# Analyze code
code = "SELECT * FROM users WHERE id = 'user_input'"
prompt = f"System: You are a security expert.\n\nUser: Analyze this code:\n{code}\n\nAssistant:"
inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
outputs = model.generate(**inputs, max_new_tokens=200)
response = tokenizer.decode(outputs[0], skip_special_tokens=True)
print(response)
Inference Script
For easier usage, use the provided inference script:
python inference_deepseek.py --model ./model --code "YOUR_CODE_HERE"
Model Performance
The model provides:
- Vulnerability type identification
- Severity assessment (CRITICAL/HIGH/MEDIUM/LOW)
- Detailed attack vector analysis
- Specific remediation recommendations
- Code-specific security guidance
Limitations
- Not 100% accurate - always verify findings manually
- May have false positives/negatives
- Best used as a pre-screening tool
- Should complement, not replace, manual security testing
- Trained on synthetic data - may need fine-tuning for specific use cases
Ethical Use
This model is intended for:
- Security research
- Penetration testing (authorized only)
- Code review and security auditing
- Educational purposes
Do not use for:
- Unauthorized system access
- Malicious activities
- Illegal purposes
Training Data
The model was trained on 440 synthetic vulnerability examples covering:
- 100 SQL Injection patterns
- 80 XSS patterns
- 60 Command Injection patterns
- 50 IDOR patterns
- 40 SSRF patterns
- 40 Authentication Bypass patterns
- 40 CSRF patterns
- 30 Path Traversal patterns
Citation
If you use this model, please cite:
@misc{pentest-vulnerability-detector,
author = {YOUR_NAME},
title = {Pentest Vulnerability Detector},
year = {2025},
publisher = {Hugging Face},
howpublished = {\url{https://huggingface.co/YOUR_USERNAME/pentest-vulnerability-detector}}
}
License
This model adapter is released under the Apache 2.0 License.
The base model (DeepSeek-Coder-1.3B-Instruct) has its own license terms.
Apache 2.0 License Summary:
- ✅ Commercial use allowed
- ✅ Modification allowed
- ✅ Distribution allowed
- ✅ Patent use allowed
- ⚠️ Must include license and copyright notice
- ⚠️ Must state changes made
See LICENSE file for full terms.
Contact
For questions or issues, please open an issue on the model repository.
Acknowledgments
- Base model: DeepSeek-Coder by DeepSeek AI
- Training framework: Hugging Face Transformers, PEFT
- Training platform: Google Colab
- Downloads last month
- 114
Model tree for elsiddik/pentest-vulnerability-detector
Base model
deepseek-ai/deepseek-coder-1.3b-instruct