|
--- |
|
license: apache-2.0 |
|
language: c++ |
|
tags: |
|
- code-generation |
|
- codellama |
|
- peft |
|
- unit-tests |
|
- causal-lm |
|
- text-generation |
|
- embedded-systems |
|
base_model: codellama/CodeLLaMA-7b-hf |
|
model_type: llama |
|
pipeline_tag: text-generation |
|
--- |
|
# 🧪 CodeLLaMA Comprehensive Test Generator (Merged v8) |
|
This repository hosts a **merged, instruction-tuned** CodeLLaMA-7B model that generates **production-grade C/C++ unit tests** for |
|
embedded and general code. It combines the base [codellama/CodeLLaMA-7b-hf](https://huggingface.co/codellama/CodeLLaMA-7b-hf) model |
|
with a custom LoRA adapter trained on a cleaned, constraint-driven unit test dataset. |
|
--- |
|
|
|
## Prompt Schema |
|
<|system|> |
|
Generate unit tests for C/C++ code following these guidelines: |
|
Cover all edge cases, boundary conditions, and error scenarios |
|
Include both positive and negative test cases |
|
Test minimum/maximum values and invalid inputs |
|
Verify error handling and exception cases |
|
Output Requirements: |
|
ONLY include test implementation code |
|
Start directly with test logic |
|
Include necessary assertions |
|
End naturally after last test case |
|
Never include framework boilerplate or headers |
|
|
|
<|user|> |
|
Create unit tests for: |
|
{your C/C++ function here} |
|
|
|
<|assistant|> |
|
|
|
--- |
|
```python |
|
from transformers import AutoTokenizer, AutoModelForCausalLM |
|
import torch |
|
|
|
model_id = "Utkarsh524/codellama_utests_full_new_ver8" |
|
tokenizer = AutoTokenizer.from_pretrained(model_id) |
|
model = AutoModelForCausalLM.from_pretrained(model_id, torch_dtype=torch.float16, device_map="auto") |
|
|
|
prompt = f"""<|system|> |
|
1.Generate unit tests for C/C++ code following these guidelines: |
|
2.Cover all edge cases, boundary conditions, and error scenarios |
|
3.Include both positive and negative test cases |
|
4.Test minimum/maximum values and invalid inputs |
|
5.Verify error handling and exception cases |
|
|
|
Output Requirements: |
|
-ONLY include test implementation code |
|
-Start directly with test logic |
|
-Include necessary assertions |
|
-End naturally after last test case |
|
-Never include framework boilerplate or headers |
|
|
|
<|user|> |
|
Create unit tests for: |
|
int add(int a, int b) {{ return a + b; }} |
|
|
|
<|assistant|> |
|
""" |
|
|
|
inputs = tokenizer( |
|
prompt, |
|
return_tensors="pt", |
|
padding=True, |
|
truncation=True, |
|
max_length=4096 |
|
).to("cuda") |
|
|
|
outputs = model.generate(**inputs, max_new_tokens=512) |
|
print(tokenizer.decode(outputs, skip_special_tokens=True)) |
|
|
|
``` |
|
--- |
|
## 📊 Training & Merge Details |
|
| Step | Description | |
|
|---------------------|-----------------------------------------------------------------------------| |
|
| Dataset | athrv/Embedded_Unittest2 (filtered, cleaned, CSV export available) | |
|
| LoRA Config | r=64, alpha=32, dropout=0.1 on q_proj/v_proj/k_proj/o_proj | |
|
| Instructions | Custom `<|system|>`, `<|user|>`, `<|assistant|>` prompt format | |
|
| Data Cleaning | Regex strip includes, main(), boilerplate; extract only test blocks | |
|
| Merge Process | model.merge_and_unload(), then save_pretrained() + upload_folder() | |
|
--- |
|
## 🔧 Tips for Best Results |
|
- **Temperature:** 0.2–0.4 |
|
- **Top-p:** 0.9 |
|
- **Keep function code self-contained and under 200 lines** |
|
- **For very long functions, split into logical units and generate tests per unit** |
|
--- |
|
## 🤝 Feedback & Citation |
|
If you use this model, please cite the CodeLLaMA paper and credit the athrv/Embedded_Unittest2 dataset. |
|
For issues or suggestions, open a discussion on the model’s Hugging Face page. |
|
Maintainer: Utkarsh524 |