Pentest Vulnerability Detector

Model Description

This is a fine-tuned version of DeepSeek-Coder-1.3B-Instruct, specialized for detecting security vulnerabilities in code.

Base Model: deepseek-ai/deepseek-coder-1.3b-instruct
Training Data: 440 synthetic vulnerability examples
Training Method: LoRA (Low-Rank Adaptation) with 4-bit quantization
Training Platform: Google Colab (Free T4 GPU)

Capabilities

The model can detect and analyze:

SQL Injection
Cross-Site Scripting (XSS)
Command Injection / RCE
Insecure Direct Object Reference (IDOR)
Server-Side Request Forgery (SSRF)
Authentication Bypass
Cross-Site Request Forgery (CSRF)
Path Traversal

Training Details

Examples: 440 vulnerability patterns
Epochs: 3
Batch Size: 2 (with gradient accumulation)
Learning Rate: 2e-4
LoRA Rank: 8
Quantization: 4-bit (NF4)
Training Time: ~45-60 minutes on T4 GPU

Usage

from transformers import AutoModelForCausalLM, AutoTokenizer
from peft import PeftModel

# Load base model
base_model = "deepseek-ai/deepseek-coder-1.3b-instruct"
model = AutoModelForCausalLM.from_pretrained(base_model, device_map="auto")
tokenizer = AutoTokenizer.from_pretrained(base_model)

# Load LoRA adapter
model = PeftModel.from_pretrained(model, "YOUR_USERNAME/pentest-vulnerability-detector")

# Analyze code
code = "SELECT * FROM users WHERE id = 'user_input'"
prompt = f"System: You are a security expert.\n\nUser: Analyze this code:\n{code}\n\nAssistant:"

inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
outputs = model.generate(**inputs, max_new_tokens=200)
response = tokenizer.decode(outputs[0], skip_special_tokens=True)
print(response)

Inference Script

For easier usage, use the provided inference script:

python inference_deepseek.py --model ./model --code "YOUR_CODE_HERE"

Model Performance

The model provides:

Vulnerability type identification
Severity assessment (CRITICAL/HIGH/MEDIUM/LOW)
Detailed attack vector analysis
Specific remediation recommendations
Code-specific security guidance

Limitations

Not 100% accurate - always verify findings manually
May have false positives/negatives
Best used as a pre-screening tool
Should complement, not replace, manual security testing
Trained on synthetic data - may need fine-tuning for specific use cases

Ethical Use

This model is intended for:

Security research
Penetration testing (authorized only)
Code review and security auditing
Educational purposes

Do not use for:

Unauthorized system access
Malicious activities
Illegal purposes

Training Data

The model was trained on 440 synthetic vulnerability examples covering:

100 SQL Injection patterns
80 XSS patterns
60 Command Injection patterns
50 IDOR patterns
40 SSRF patterns
40 Authentication Bypass patterns
40 CSRF patterns
30 Path Traversal patterns

Citation

If you use this model, please cite:

@misc{pentest-vulnerability-detector,
  author = {YOUR_NAME},
  title = {Pentest Vulnerability Detector},
  year = {2025},
  publisher = {Hugging Face},
  howpublished = {\url{https://huggingface.co/YOUR_USERNAME/pentest-vulnerability-detector}}
}

License

This model adapter is released under the Apache 2.0 License.

The base model (DeepSeek-Coder-1.3B-Instruct) has its own license terms.

Apache 2.0 License Summary:

✅ Commercial use allowed
✅ Modification allowed
✅ Distribution allowed
✅ Patent use allowed
⚠️ Must include license and copyright notice
⚠️ Must state changes made

See LICENSE file for full terms.

Contact

For questions or issues, please open an issue on the model repository.

Acknowledgments

Base model: DeepSeek-Coder by DeepSeek AI
Training framework: Hugging Face Transformers, PEFT
Training platform: Google Colab

Downloads last month: 114

Model tree for elsiddik/pentest-vulnerability-detector

Base model

deepseek-ai/deepseek-coder-1.3b-instruct

Adapter

(43)

this model