CodeLlama Embedded Test Generator (v9)

This repository hosts an instruction-tuned CodeLlama-7B model that generates production-grade C/C++ unit tests for embedded systems. The model combines the base codellama/CodeLlama-7b-hf model with a custom LoRA adapter trained on a curated dataset of embedded software tests.

Prompt Schema

<|system|> Generate unit tests for C/C++ code. Cover all edge cases, boundary conditions, and error scenarios. Output Constraints:

ONLY include test code (no explanations, headers, or main functions)
Start directly with TEST(...)
End after last test case
Never include framework boilerplate

<|user|> Write test cases for the following C/C++ code: {your C/C++ function here}

<|assistant|>

Quick Inference Example

from transformers import AutoTokenizer, AutoModelForCausalLM
import torch

model_id = "Utkarsh524/codellama_utests_full_new_ver9"
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(model_id, torch_dtype=torch.float16, device_map="auto")

prompt = f"""<|system|>
Generate unit tests for C/C++ code. Cover all edge cases, boundary conditions, and error scenarios.
Output Constraints:
ONLY include test code (no explanations, headers, or main functions)
Start directly with TEST(...)
End after last test case
Never include framework boilerplate

<|user|>
Write test cases for the following C/C++ code:
int add(int a, int b) {{ return a + b; }}

<|assistant|>
"""

inputs = tokenizer(
prompt,
return_tensors="pt",
padding=True,
truncation=True,
max_length=4096
).to("cuda")

outputs = model.generate(**inputs, max_new_tokens=512, temperature=0.3, top_p=0.9)
print(tokenizer.decode(outputs, skip_special_tokens=True).split("<|assistant|>")[-1].strip())

Training & Optimization Details

Step	Description
Dataset	athrv/Embedded_Unittest2 (filtered for valid code-test pairs)
Preprocessing	Token length filtering (≤4096), special token injection
Quantization	8-bit (BitsAndBytesConfig), llm_int8_threshold=6.0
LoRA Config	r=64, alpha=32, dropout=0.1 on q_proj/v_proj/k_proj/o_proj
Training	4 epochs, batch=4 (effective 8), lr=2e-4, FP16
Optimization	Paged AdamW 8-bit, gradient checkpointing, custom data collator
Special Tokens	Added `<

Tips for Best Results

Temperature: 0.2–0.4
Top-p: 0.85–0.95
Max New Tokens: 256–512
Input Formatting:
- Include complete function signatures
- Remove unnecessary comments
- Keep functions under 200 lines
- For long functions, split into logical units

Feedback & Citation

Dataset Credit: athrv/Embedded_Unittest2
Report Issues: Model's Hugging Face page

Utkarsh524
/

codellama_utests_full_new_ver9

CodeLlama Embedded Test Generator (v9)

Prompt Schema

Quick Inference Example

Training & Optimization Details

Tips for Best Results

Feedback & Citation

Maintainer: Utkarsh524
Model Version: v9 (4-epoch trained)

Model tree for Utkarsh524/codellama_utests_full_new_ver9

CodeLlama Embedded Test Generator (v9)

Prompt Schema

Quick Inference Example

Training & Optimization Details

Tips for Best Results

Feedback & Citation

Maintainer: Utkarsh524Model Version: v9 (4-epoch trained)

Model tree for Utkarsh524/codellama_utests_full_new_ver9

Maintainer: Utkarsh524
Model Version: v9 (4-epoch trained)