CodeLlama Embedded Test Generator (v9)
This repository hosts an instruction-tuned CodeLlama-7B model that generates production-grade C/C++ unit tests for embedded systems. The model combines the base codellama/CodeLlama-7b-hf model with a custom LoRA adapter trained on a curated dataset of embedded software tests.
Prompt Schema
<|system|> Generate unit tests for C/C++ code. Cover all edge cases, boundary conditions, and error scenarios. Output Constraints:
- ONLY include test code (no explanations, headers, or main functions)
- Start directly with TEST(...)
- End after last test case
- Never include framework boilerplate
<|user|> Write test cases for the following C/C++ code: {your C/C++ function here}
<|assistant|>
Quick Inference Example
from transformers import AutoTokenizer, AutoModelForCausalLM
import torch
model_id = "Utkarsh524/codellama_utests_full_new_ver9"
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(model_id, torch_dtype=torch.float16, device_map="auto")
prompt = f"""<|system|>
Generate unit tests for C/C++ code. Cover all edge cases, boundary conditions, and error scenarios.
Output Constraints:
ONLY include test code (no explanations, headers, or main functions)
Start directly with TEST(...)
End after last test case
Never include framework boilerplate
<|user|>
Write test cases for the following C/C++ code:
int add(int a, int b) {{ return a + b; }}
<|assistant|>
"""
inputs = tokenizer(
prompt,
return_tensors="pt",
padding=True,
truncation=True,
max_length=4096
).to("cuda")
outputs = model.generate(**inputs, max_new_tokens=512, temperature=0.3, top_p=0.9)
print(tokenizer.decode(outputs, skip_special_tokens=True).split("<|assistant|>")[-1].strip())
Training & Optimization Details
Step | Description |
---|---|
Dataset | athrv/Embedded_Unittest2 (filtered for valid code-test pairs) |
Preprocessing | Token length filtering (โค4096), special token injection |
Quantization | 8-bit (BitsAndBytesConfig), llm_int8_threshold=6.0 |
LoRA Config | r=64, alpha=32, dropout=0.1 on q_proj/v_proj/k_proj/o_proj |
Training | 4 epochs, batch=4 (effective 8), lr=2e-4, FP16 |
Optimization | Paged AdamW 8-bit, gradient checkpointing, custom data collator |
Special Tokens | Added `< |
Tips for Best Results
- Temperature: 0.2โ0.4
- Top-p: 0.85โ0.95
- Max New Tokens: 256โ512
- Input Formatting:
- Include complete function signatures
- Remove unnecessary comments
- Keep functions under 200 lines
- For long functions, split into logical units
Feedback & Citation
Dataset Credit: athrv/Embedded_Unittest2
Report Issues: Model's Hugging Face page
Maintainer: Utkarsh524
Model Version: v9 (4-epoch trained)
- Downloads last month
- 0
Inference Providers
NEW
This model isn't deployed by any Inference Provider.
๐
Ask for provider support
Model tree for Utkarsh524/codellama_utests_full_new_ver9
Base model
codellama/CodeLlama-7b-hf