πŸ§ͺ CodeLLaMA Unit Test Generator β€” LoRA Adapter (v3)

This is a LoRA adapter trained on embedded C/C++ functions and their corresponding unit tests using the athrv/Embedded_Unittest2 dataset.

The adapter is meant to be used with codellama/CodeLlama-7b-hf and enhances its ability to generate production-ready C/C++ unit tests, especially for embedded systems.


πŸš€ Key Improvements in v3

  • βœ… Enhanced instruction prompt tuning using <|system|>, <|user|>, <|assistant|>
  • 🧹 Stripped out #include, main() and framework boilerplate from training targets
  • πŸ”š Appended // END_OF_TESTS to each output to guide model termination
  • 🧠 Fine-tuned with sequence length of 4096 tokens for long-context unit tests
  • πŸ€– Optimized for frameworks like CppUTest or GoogleTest

πŸ”§ How to Use

from transformers import AutoTokenizer, AutoModelForCausalLM
from peft import PeftModel
import torch

base_model_id = "codellama/CodeLlama-7b-hf"
adapter_id = "Utkarsh524/codellama_utests_embedded_v3"

# Load tokenizer
tokenizer = AutoTokenizer.from_pretrained(adapter_id)
tokenizer.pad_token = tokenizer.eos_token

# Load base model
base = AutoModelForCausalLM.from_pretrained(
    base_model_id,
    device_map="auto",
    torch_dtype=torch.float16,
    trust_remote_code=True
)

# Resize to match tokenizer with special tokens
base.resize_token_embeddings(len(tokenizer))

# Attach LoRA adapter
model = PeftModel.from_pretrained(base, adapter_id)

# Prepare prompt
prompt = """<|system|>
Generate comprehensive unit tests for C/C++ code. Cover all edge cases, boundary conditions, and error scenarios.
Output Constraints:
1. ONLY include test code (no explanations, headers, or main functions)
2. Start directly with TEST(...)
3. End after last test case
4. Never include framework boilerplate
<|user|>
Create tests for:
int factorial(int n) { return (n <= 1) ? 1 : n * factorial(n - 1); }
<|assistant|>
"""

inputs = tokenizer(prompt, return_tensors="pt").to("cuda")
outputs = model.generate(**inputs, max_new_tokens=512, eos_token_id=tokenizer.convert_tokens_to_ids("// END_OF_TESTS"))
print(tokenizer.decode(outputs[0], skip_special_tokens=True))
Downloads last month
0
Safetensors
Model size
6.74B params
Tensor type
F16
Β·
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Model tree for Utkarsh524/codellama_utests_full_new_ver3

Adapter
(536)
this model