Utkarsh524
/

codellama_utests_full_new_ver8

Text Generation

code-generation

embedded-systems

Model card Files Files and versions

codellama_utests_full_new_ver8 / README.md

Utkarsh524's picture

Update README.md

db051bb verified 3 months ago

|

history blame contribute delete

3.56 kB

	---
	license: apache-2.0
	language: c++
	tags:
	- code-generation
	- codellama
	- peft
	- unit-tests
	- causal-lm
	- text-generation
	- embedded-systems
	base_model: codellama/CodeLLaMA-7b-hf
	model_type: llama
	pipeline_tag: text-generation
	---
	# 🧪 CodeLLaMA Comprehensive Test Generator (Merged v8)
	This repository hosts a merged, instruction-tuned CodeLLaMA-7B model that generates production-grade C/C++ unit tests for
	embedded and general code. It combines the base [codellama/CodeLLaMA-7b-hf](https://huggingface.co/codellama/CodeLLaMA-7b-hf) model
	with a custom LoRA adapter trained on a cleaned, constraint-driven unit test dataset.
	---

	## Prompt Schema
	<\|system\|>
	Generate unit tests for C/C++ code following these guidelines:
	Cover all edge cases, boundary conditions, and error scenarios
	Include both positive and negative test cases
	Test minimum/maximum values and invalid inputs
	Verify error handling and exception cases
	Output Requirements:
	ONLY include test implementation code
	Start directly with test logic
	Include necessary assertions
	End naturally after last test case
	Never include framework boilerplate or headers

	<\|user\|>
	Create unit tests for:
	{your C/C++ function here}

	<\|assistant\|>

	---
	```python
	from transformers import AutoTokenizer, AutoModelForCausalLM
	import torch

	model_id = "Utkarsh524/codellama_utests_full_new_ver8"
	tokenizer = AutoTokenizer.from_pretrained(model_id)
	model = AutoModelForCausalLM.from_pretrained(model_id, torch_dtype=torch.float16, device_map="auto")

	prompt = f"""<\|system\|>
	1.Generate unit tests for C/C++ code following these guidelines:
	2.Cover all edge cases, boundary conditions, and error scenarios
	3.Include both positive and negative test cases
	4.Test minimum/maximum values and invalid inputs
	5.Verify error handling and exception cases

	Output Requirements:
	-ONLY include test implementation code
	-Start directly with test logic
	-Include necessary assertions
	-End naturally after last test case
	-Never include framework boilerplate or headers

	<\|user\|>
	Create unit tests for:
	int add(int a, int b) {{ return a + b; }}

	<\|assistant\|>
	"""

	inputs = tokenizer(
	prompt,
	return_tensors="pt",
	padding=True,
	truncation=True,
	max_length=4096
	).to("cuda")

	outputs = model.generate(**inputs, max_new_tokens=512)
	print(tokenizer.decode(outputs, skip_special_tokens=True))

	```
	---
	## 📊 Training & Merge Details
	\| Step \| Description \|
	\|---------------------\|-----------------------------------------------------------------------------\|
	\| Dataset \| athrv/Embedded_Unittest2 (filtered, cleaned, CSV export available) \|
	\| LoRA Config \| r=64, alpha=32, dropout=0.1 on q_proj/v_proj/k_proj/o_proj \|
	\| Instructions \| Custom `<\|system\|>`, `<\|user\|>`, `<\|assistant\|>` prompt format \|
	\| Data Cleaning \| Regex strip includes, main(), boilerplate; extract only test blocks \|
	\| Merge Process \| model.merge_and_unload(), then save_pretrained() + upload_folder() \|
	---
	## 🔧 Tips for Best Results
	- Temperature: 0.2–0.4
	- Top-p: 0.9
	- Keep function code self-contained and under 200 lines
	- For very long functions, split into logical units and generate tests per unit
	---
	## 🤝 Feedback & Citation
	If you use this model, please cite the CodeLLaMA paper and credit the athrv/Embedded_Unittest2 dataset.
	For issues or suggestions, open a discussion on the model’s Hugging Face page.
	Maintainer: Utkarsh524