CodeLlama Embedded Test Generator (v9)

This repository hosts an instruction-tuned CodeLlama-7B model that generates production-grade C/C++ unit tests for embedded systems. The model combines the base codellama/CodeLlama-7b-hf model with a custom LoRA adapter trained on a curated dataset of embedded software tests.


Prompt Schema

<|system|> Generate unit tests for C/C++ code. Cover all edge cases, boundary conditions, and error scenarios. Output Constraints:

  1. ONLY include test code (no explanations, headers, or main functions)
  2. Start directly with TEST(...)
  3. End after last test case
  4. Never include framework boilerplate

<|user|> Write test cases for the following C/C++ code: {your C/C++ function here}

<|assistant|>


Quick Inference Example

from transformers import AutoTokenizer, AutoModelForCausalLM
import torch

model_id = "Utkarsh524/codellama_utests_full_new_ver9"
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(model_id, torch_dtype=torch.float16, device_map="auto")

prompt = f"""<|system|>
Generate unit tests for C/C++ code. Cover all edge cases, boundary conditions, and error scenarios.
Output Constraints:
ONLY include test code (no explanations, headers, or main functions)
Start directly with TEST(...)
End after last test case
Never include framework boilerplate

<|user|>
Write test cases for the following C/C++ code:
int add(int a, int b) {{ return a + b; }}

<|assistant|>
"""

inputs = tokenizer(
prompt,
return_tensors="pt",
padding=True,
truncation=True,
max_length=4096
).to("cuda")

outputs = model.generate(**inputs, max_new_tokens=512, temperature=0.3, top_p=0.9)
print(tokenizer.decode(outputs, skip_special_tokens=True).split("<|assistant|>")[-1].strip())

Training & Optimization Details

Step Description
Dataset athrv/Embedded_Unittest2 (filtered for valid code-test pairs)
Preprocessing Token length filtering (โ‰ค4096), special token injection
Quantization 8-bit (BitsAndBytesConfig), llm_int8_threshold=6.0
LoRA Config r=64, alpha=32, dropout=0.1 on q_proj/v_proj/k_proj/o_proj
Training 4 epochs, batch=4 (effective 8), lr=2e-4, FP16
Optimization Paged AdamW 8-bit, gradient checkpointing, custom data collator
Special Tokens Added `<

Tips for Best Results

  • Temperature: 0.2โ€“0.4
  • Top-p: 0.85โ€“0.95
  • Max New Tokens: 256โ€“512
  • Input Formatting:
    • Include complete function signatures
    • Remove unnecessary comments
    • Keep functions under 200 lines
    • For long functions, split into logical units

Feedback & Citation

Dataset Credit: athrv/Embedded_Unittest2
Report Issues: Model's Hugging Face page

Maintainer: Utkarsh524
Model Version: v9 (4-epoch trained)

Downloads last month
0
Safetensors
Model size
6.74B params
Tensor type
F16
ยท
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for Utkarsh524/codellama_utests_full_new_ver9

Adapter
(536)
this model