qwen-1.8b-adversarial-179k

This model is a fine-tuned version of Qwen/Qwen1.5-1.8B trained on 179,378 adversarial prompts.

Model Details

  • Base Model: Qwen/Qwen1.5-1.8B
  • Training Data: 179,378 adversarial examples
  • Training Time: 2h 51m on NVIDIA A100
  • Final Loss: 1.973
  • Model Size: 3.42 GB

Training Configuration

  • Batch Size: 32
  • Gradient Accumulation: 4
  • Effective Batch Size: 128
  • Learning Rate: 2e-4
  • Training Steps: 1,402
  • LoRA Rank: 32
  • LoRA Alpha: 64

Usage

from transformers import AutoTokenizer, AutoModelForCausalLM

model = AutoModelForCausalLM.from_pretrained("UdayGattu23/qwen-1.8b-adversarial-179k")
tokenizer = AutoTokenizer.from_pretrained("UdayGattu23/qwen-1.8b-adversarial-179k")

# Generate text
prompt = "Generate a direct prompt injection:"
inputs = tokenizer(prompt, return_tensors="pt")
outputs = model.generate(**inputs, max_new_tokens=200, temperature=0.7)
response = tokenizer.decode(outputs[0], skip_special_tokens=True)
print(response)

Training Results

  • Successfully generates adversarial prompts
  • Specializes in jailbreak scenarios
  • Creates prompt injection patterns
  • Maintains coherent output structure

Important Notes

⚠️ This is a complete fine-tuned model with merged LoRA weights, not just adapter weights.

⚠️ Research Use Only: This model is trained on adversarial data and should only be used for security research and testing purposes.

Training Date

2025-08-17

Downloads last month
41
Safetensors
Model size
1.84B params
Tensor type
F16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for UdayGattu23/qwen-1.8b-adversarial-179k

Base model

Qwen/Qwen1.5-1.8B
Quantized
(10)
this model