Utkarsh524's picture
Update README.md
db051bb verified
---
license: apache-2.0
language: c++
tags:
- code-generation
- codellama
- peft
- unit-tests
- causal-lm
- text-generation
- embedded-systems
base_model: codellama/CodeLLaMA-7b-hf
model_type: llama
pipeline_tag: text-generation
---
# 🧪 CodeLLaMA Comprehensive Test Generator (Merged v8)
This repository hosts a **merged, instruction-tuned** CodeLLaMA-7B model that generates **production-grade C/C++ unit tests** for
embedded and general code. It combines the base [codellama/CodeLLaMA-7b-hf](https://huggingface.co/codellama/CodeLLaMA-7b-hf) model
with a custom LoRA adapter trained on a cleaned, constraint-driven unit test dataset.
---
## Prompt Schema
<|system|>
Generate unit tests for C/C++ code following these guidelines:
Cover all edge cases, boundary conditions, and error scenarios
Include both positive and negative test cases
Test minimum/maximum values and invalid inputs
Verify error handling and exception cases
Output Requirements:
ONLY include test implementation code
Start directly with test logic
Include necessary assertions
End naturally after last test case
Never include framework boilerplate or headers
<|user|>
Create unit tests for:
{your C/C++ function here}
<|assistant|>
---
```python
from transformers import AutoTokenizer, AutoModelForCausalLM
import torch
model_id = "Utkarsh524/codellama_utests_full_new_ver8"
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(model_id, torch_dtype=torch.float16, device_map="auto")
prompt = f"""<|system|>
1.Generate unit tests for C/C++ code following these guidelines:
2.Cover all edge cases, boundary conditions, and error scenarios
3.Include both positive and negative test cases
4.Test minimum/maximum values and invalid inputs
5.Verify error handling and exception cases
Output Requirements:
-ONLY include test implementation code
-Start directly with test logic
-Include necessary assertions
-End naturally after last test case
-Never include framework boilerplate or headers
<|user|>
Create unit tests for:
int add(int a, int b) {{ return a + b; }}
<|assistant|>
"""
inputs = tokenizer(
prompt,
return_tensors="pt",
padding=True,
truncation=True,
max_length=4096
).to("cuda")
outputs = model.generate(**inputs, max_new_tokens=512)
print(tokenizer.decode(outputs, skip_special_tokens=True))
```
---
## 📊 Training & Merge Details
| Step | Description |
|---------------------|-----------------------------------------------------------------------------|
| Dataset | athrv/Embedded_Unittest2 (filtered, cleaned, CSV export available) |
| LoRA Config | r=64, alpha=32, dropout=0.1 on q_proj/v_proj/k_proj/o_proj |
| Instructions | Custom `<|system|>`, `<|user|>`, `<|assistant|>` prompt format |
| Data Cleaning | Regex strip includes, main(), boilerplate; extract only test blocks |
| Merge Process | model.merge_and_unload(), then save_pretrained() + upload_folder() |
---
## 🔧 Tips for Best Results
- **Temperature:** 0.2–0.4
- **Top-p:** 0.9
- **Keep function code self-contained and under 200 lines**
- **For very long functions, split into logical units and generate tests per unit**
---
## 🤝 Feedback & Citation
If you use this model, please cite the CodeLLaMA paper and credit the athrv/Embedded_Unittest2 dataset.
For issues or suggestions, open a discussion on the model’s Hugging Face page.
Maintainer: Utkarsh524