CodeLlama Embedded Test Generator (v10)

This model generates production-grade unit tests for embedded C/C++ code. It's a merged adapter of CodeLlama-7B fine-tuned with:

8-bit quantization
Flash Attention 2
Linear RoPE scaling (factor=2.0)
Custom instruction tuning on embedded unit tests

Key Features

Generates framework-agnostic test cases
Optimized for embedded systems constraints
Strict output formatting (no boilerplate)
Special tokens for structured prompting
8192 context window support

Technical Specifications

Component	Configuration
Base Model	CodeLlama-7B-HF
Fine-tuning	LoRA (r=64, alpha=32)
Quantization	8-bit (llm_int8_threshold=6.0)
Attention	Flash Attention 2
Context Window	8192 tokens (RoPE scaled)
Training Epochs	2
Batch Size	2 (effective 8 with grad accum)
Learning Rate	1.5e-4
Optimizer	Paged AdamW 8-bit

🧪 Prompt Structure

<|system|> Generate comprehensive, framework-agnostic unit tests for C/C++ code. Focus on: Testing all functions and edge cases Avoiding redundant headers Covering boundary conditions and error scenarios Using clear test names without repetitions Generate ONLY test logic without framework-specific macros.

<|user|> Generate unit tests for: {your_function_here}

<|assistant|>

🚀 Inference Example

from transformers import AutoTokenizer, AutoModelForCausalLM
import torch

model_id = "Utkarsh524/codellama_utests_full_new_ver10"
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(
model_id,
torch_dtype=torch.float16,
device_map="auto",
attn_implementation="flash_attention_2" # Recommended for speed
)

def generate_tests(function_code):
prompt = f"""<|system|>
Generate comprehensive, framework-agnostic unit tests for C/C++ code. Focus on:

Testing all functions and edge cases

Avoiding redundant headers

Covering boundary conditions and error scenarios

Using clear test names without repetitions
Generate ONLY test logic without framework-specific macros.

<|user|>
Generate unit tests for:
{function_code}

<|assistant|>
"""
inputs = tokenizer(prompt, return_tensors="pt", truncation=True, max_length=8192).to("cuda")
outputs = model.generate(
**inputs,
max_new_tokens=512,
temperature=0.3,
top_p=0.9,
do_sample=True,
pad_token_id=tokenizer.eos_token_id,
eos_token_id=tokenizer.eos_token_id
)
return tokenizer.decode(outputs, skip_special_tokens=True).split("<|assistant|>")[-1]

Example usage
print(generate_tests("int add(int a, int b) { return a + b; }"))

Training Details

Dataset

Source: athrv/Embedded_Unittest2
Processing:
- Filtered invalid/empty examples
- Token length limit: 8192
- Added special tokens: <|system|>, <|user|>, <|assistant|>, <|end|>

LoRA Configuration

LoraConfig( -r=64, -lora_alpha=32, -target_modules=[ -"q_proj", "v_proj", "k_proj", "o_proj","gate_proj", "up_proj", "down_proj" # All linear layers], -lora_dropout=0.1, -task_type="CAUSAL_LM" )

Merge Process

Load base model base_model = AutoModelForCausalLM.from_pretrained("codellama/CodeLlama-7b-hf")

Load and merge adapter model = PeftModel.from_pretrained(base_model, "codellama_utests_optimized") merged_model = model.merge_and_unload()

Special token handling tokenizer.add_special_tokens({"additional_special_tokens": ["<|system|>", ...]}) base_model.resize_token_embeddings(len(tokenizer))

Optimization Tips

Hardware: Use GPUs with >24GB VRAM (A10/A100 recommended)
Inference:
- Temperature: 0.2-0.4
- Top-p: 0.85-0.95
- Max New Tokens: 256-768
Input Formatting:
- Keep functions under 200 lines
- Include complete signatures
- Avoid preprocessor directives

Dataset Attribution: athrv/Embedded_Unittest2

Maintainer: Utkarsh524
Model ID: codellama_utests_full_new_ver10