CodeLlama Embedded Test Generator (v10)
This model generates production-grade unit tests for embedded C/C++ code. It's a merged adapter of CodeLlama-7B fine-tuned with:
- 8-bit quantization
- Flash Attention 2
- Linear RoPE scaling (factor=2.0)
- Custom instruction tuning on embedded unit tests
Key Features
- Generates framework-agnostic test cases
- Optimized for embedded systems constraints
- Strict output formatting (no boilerplate)
- Special tokens for structured prompting
- 8192 context window support
Technical Specifications
Component | Configuration |
---|---|
Base Model | CodeLlama-7B-HF |
Fine-tuning | LoRA (r=64, alpha=32) |
Quantization | 8-bit (llm_int8_threshold=6.0) |
Attention | Flash Attention 2 |
Context Window | 8192 tokens (RoPE scaled) |
Training Epochs | 2 |
Batch Size | 2 (effective 8 with grad accum) |
Learning Rate | 1.5e-4 |
Optimizer | Paged AdamW 8-bit |
π§ͺ Prompt Structure
<|system|> Generate comprehensive, framework-agnostic unit tests for C/C++ code. Focus on: Testing all functions and edge cases Avoiding redundant headers Covering boundary conditions and error scenarios Using clear test names without repetitions Generate ONLY test logic without framework-specific macros.
<|user|> Generate unit tests for: {your_function_here}
<|assistant|>
π Inference Example
from transformers import AutoTokenizer, AutoModelForCausalLM
import torch
model_id = "Utkarsh524/codellama_utests_full_new_ver10"
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(
model_id,
torch_dtype=torch.float16,
device_map="auto",
attn_implementation="flash_attention_2" # Recommended for speed
)
def generate_tests(function_code):
prompt = f"""<|system|>
Generate comprehensive, framework-agnostic unit tests for C/C++ code. Focus on:
Testing all functions and edge cases
Avoiding redundant headers
Covering boundary conditions and error scenarios
Using clear test names without repetitions
Generate ONLY test logic without framework-specific macros.
<|user|>
Generate unit tests for:
{function_code}
<|assistant|>
"""
inputs = tokenizer(prompt, return_tensors="pt", truncation=True, max_length=8192).to("cuda")
outputs = model.generate(
**inputs,
max_new_tokens=512,
temperature=0.3,
top_p=0.9,
do_sample=True,
pad_token_id=tokenizer.eos_token_id,
eos_token_id=tokenizer.eos_token_id
)
return tokenizer.decode(outputs, skip_special_tokens=True).split("<|assistant|>")[-1]
Example usage
print(generate_tests("int add(int a, int b) { return a + b; }"))
Training Details
Dataset
- Source:
athrv/Embedded_Unittest2
- Processing:
- Filtered invalid/empty examples
- Token length limit: 8192
- Added special tokens:
<|system|>
,<|user|>
,<|assistant|>
,<|end|>
LoRA Configuration
LoraConfig( -r=64, -lora_alpha=32, -target_modules=[ -"q_proj", "v_proj", "k_proj", "o_proj","gate_proj", "up_proj", "down_proj" # All linear layers], -lora_dropout=0.1, -task_type="CAUSAL_LM" )
Merge Process
Load base model base_model = AutoModelForCausalLM.from_pretrained("codellama/CodeLlama-7b-hf")
Load and merge adapter model = PeftModel.from_pretrained(base_model, "codellama_utests_optimized") merged_model = model.merge_and_unload()
Special token handling tokenizer.add_special_tokens({"additional_special_tokens": ["<|system|>", ...]}) base_model.resize_token_embeddings(len(tokenizer))
Optimization Tips
- Hardware: Use GPUs with >24GB VRAM (A10/A100 recommended)
- Inference:
- Temperature: 0.2-0.4
- Top-p: 0.85-0.95
- Max New Tokens: 256-768
- Input Formatting:
- Keep functions under 200 lines
- Include complete signatures
- Avoid preprocessor directives
- Dataset Attribution:
athrv/Embedded_Unittest2
Maintainer: Utkarsh524
Model ID: codellama_utests_full_new_ver10
- Downloads last month
- 0