CodeLlama Embedded Test Generator (v10)

This model generates production-grade unit tests for embedded C/C++ code. It's a merged adapter of CodeLlama-7B fine-tuned with:

  • 8-bit quantization
  • Flash Attention 2
  • Linear RoPE scaling (factor=2.0)
  • Custom instruction tuning on embedded unit tests

Key Features

  • Generates framework-agnostic test cases
  • Optimized for embedded systems constraints
  • Strict output formatting (no boilerplate)
  • Special tokens for structured prompting
  • 8192 context window support

Technical Specifications

Component Configuration
Base Model CodeLlama-7B-HF
Fine-tuning LoRA (r=64, alpha=32)
Quantization 8-bit (llm_int8_threshold=6.0)
Attention Flash Attention 2
Context Window 8192 tokens (RoPE scaled)
Training Epochs 2
Batch Size 2 (effective 8 with grad accum)
Learning Rate 1.5e-4
Optimizer Paged AdamW 8-bit

πŸ§ͺ Prompt Structure

<|system|> Generate comprehensive, framework-agnostic unit tests for C/C++ code. Focus on: Testing all functions and edge cases Avoiding redundant headers Covering boundary conditions and error scenarios Using clear test names without repetitions Generate ONLY test logic without framework-specific macros.

<|user|> Generate unit tests for: {your_function_here}

<|assistant|>

πŸš€ Inference Example

from transformers import AutoTokenizer, AutoModelForCausalLM
import torch

model_id = "Utkarsh524/codellama_utests_full_new_ver10"
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(
model_id,
torch_dtype=torch.float16,
device_map="auto",
attn_implementation="flash_attention_2" # Recommended for speed
)

def generate_tests(function_code):
prompt = f"""<|system|>
Generate comprehensive, framework-agnostic unit tests for C/C++ code. Focus on:

Testing all functions and edge cases

Avoiding redundant headers

Covering boundary conditions and error scenarios

Using clear test names without repetitions
Generate ONLY test logic without framework-specific macros.

<|user|>
Generate unit tests for:
{function_code}

<|assistant|>
"""
inputs = tokenizer(prompt, return_tensors="pt", truncation=True, max_length=8192).to("cuda")
outputs = model.generate(
**inputs,
max_new_tokens=512,
temperature=0.3,
top_p=0.9,
do_sample=True,
pad_token_id=tokenizer.eos_token_id,
eos_token_id=tokenizer.eos_token_id
)
return tokenizer.decode(outputs, skip_special_tokens=True).split("<|assistant|>")[-1]

Example usage
print(generate_tests("int add(int a, int b) { return a + b; }"))

Training Details

Dataset

  • Source: athrv/Embedded_Unittest2
  • Processing:
    • Filtered invalid/empty examples
    • Token length limit: 8192
    • Added special tokens: <|system|>, <|user|>, <|assistant|>, <|end|>

LoRA Configuration

LoraConfig( -r=64, -lora_alpha=32, -target_modules=[ -"q_proj", "v_proj", "k_proj", "o_proj","gate_proj", "up_proj", "down_proj" # All linear layers], -lora_dropout=0.1, -task_type="CAUSAL_LM" )

Merge Process

Load base model base_model = AutoModelForCausalLM.from_pretrained("codellama/CodeLlama-7b-hf")

Load and merge adapter model = PeftModel.from_pretrained(base_model, "codellama_utests_optimized") merged_model = model.merge_and_unload()

Special token handling tokenizer.add_special_tokens({"additional_special_tokens": ["<|system|>", ...]}) base_model.resize_token_embeddings(len(tokenizer))

Optimization Tips

  1. Hardware: Use GPUs with >24GB VRAM (A10/A100 recommended)
  2. Inference:
    • Temperature: 0.2-0.4
    • Top-p: 0.85-0.95
    • Max New Tokens: 256-768
  3. Input Formatting:
    • Keep functions under 200 lines
    • Include complete signatures
    • Avoid preprocessor directives
  • Dataset Attribution: athrv/Embedded_Unittest2

Maintainer: Utkarsh524
Model ID: codellama_utests_full_new_ver10

Downloads last month
0
Safetensors
Model size
6.74B params
Tensor type
F16
Β·
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support