LLaMA 3.2 3B - Java Code Generation (Reflection)
This model is a fine-tuned version of meta-llama/Llama-3.2-3B specifically trained for Java method generation using a novel reflection-based meta-learning approach.
Model Description
- Base Model: LLaMA 3.2 3B
- Training Method: Reflection-based Meta-Learning
- Task: Java method generation from natural language descriptions
- Training Data: 100k examples from CodeXGLUE dataset with Claude annotations
- Language: Java
- License: LLaMA 3.2 Community License
Training Details
Dataset
Trained on Naholav/llama3.2-java-codegen-90sft-10meta-claude-v1:
- 90,000 SFT examples for standard training
- 10,000 meta-annotated examples with Claude's error analysis and learning insights
- Source: CodeXGLUE text-to-code (Java) dataset
Reflection-Based Training
This model uses a unique teacher-student reflection paradigm:
- Teacher: Claude 4 Sonnet provides error analysis and guidance
- Student: LLaMA 3.2 3B learns from its mistakes through structured reflection
- Meta examples include error analysis and learning insights for deeper understanding
Training Configuration
- Epochs: 3
- Batch Size: 8 × 6 gradient accumulation = 48 effective
- Learning Rate: 2e-5
- Max Length: 2048 tokens
- Precision: float32 (for stability)
- Optimizer: AdamW
- Scheduler: Cosine with warmup
- Early Stopping: Dual tracking (SFT and Meta losses)
Hardware
- GPU: NVIDIA A100 80GB
- Training Time: ~9 hours
- Framework: PyTorch 2.0+ with Transformers
Usage
Installation
pip install transformers torch
Quick Start
from transformers import AutoTokenizer, AutoModelForCausalLM
# Load model and tokenizer
model_name = "Naholav/llama-3.2-3b-100k-codeXGLUE-reflection"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(model_name, torch_dtype=torch.float16, device_map="auto")
# Prepare prompt
task_description = "returns the sum of two integers"
prompt = f"""You are an expert Java programmer. Generate a complete, working Java method for the given description.
Task: {task_description}
Requirements:
- Write a complete Java method
- Use proper syntax and naming conventions
- Include return statements where needed
- Keep it concise but functional
```java
"""
# Generate code
inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
outputs = model.generate(
**inputs,
max_new_tokens=150,
temperature=0.2,
do_sample=True,
top_p=0.95,
pad_token_id=tokenizer.eos_token_id
)
generated_code = tokenizer.decode(outputs[0], skip_special_tokens=True)
print(generated_code)
Expected Output Format
The model generates Java methods following this pattern:
public int sum(int a, int b) {
return a + b;
}
Testing on Your Own Data
For local evaluation, you can use:
- Test dataset from this project: 100 examples
- Original Microsoft test set: 2k examples
Important: Remember to clean the natural language descriptions before inference:
def clean_nl(nl_description):
cleaned = nl_description.replace("concode_field_sep", " | ")
cleaned = cleaned.replace("concode_elem_sep", ", ")
return ' '.join(cleaned.split())
Performance
The model was evaluated during training with:
- Separate tracking of SFT and Meta losses
- 5 evaluations per epoch
- Dual early stopping based on both loss types
- Best checkpoint selected based on average validation loss
Reflection Training Methodology
This model was trained using a novel approach where:
- Error Recognition: Model learns to identify common coding mistakes
- Pattern Analysis: Understands method signatures and class structures
- Knowledge Gaps: Recognizes missing OOP concepts
- Improvement Strategy: Internalizes better coding patterns
Meta examples included structured reflection prompts with:
- Student's incorrect attempt
- Teacher's correct implementation
- Detailed error analysis
- Learning insights and guidance
Comparison with SFT Model
This is the reflection-based version. For comparison with standard supervised fine-tuning:
- SFT Model
- GitHub Repository for implementation details
Limitations
- Trained specifically for Java method generation
- May not generalize well to full classes or other programming languages
- Best suited for single-method generation tasks
- Context window limited to 2048 tokens
Ethical Considerations
- The model should not be used to generate malicious code
- Generated code should be reviewed before use in production
- Not suitable for generating code that handles sensitive data without proper review
Key Differences from SFT Model
- Training Data: Uses same dataset but processes meta examples differently
- Learning Paradigm: Teacher-student reflection vs direct imitation
- Loss Tracking: Dual tracking of SFT and Meta losses
- Expected Benefit: Better understanding of coding patterns and error avoidance
Acknowledgments
- Meta AI for the LLaMA 3.2 base model
- Microsoft Research for the CodeXGLUE text-to-code (Java) dataset
- Anthropic for Claude 4 Sonnet's error analysis and insights
- Hugging Face for the training infrastructure
- Downloads last month
- 48
Inference Providers
NEW
This model isn't deployed by any Inference Provider.
🙋
Ask for provider support
Model tree for Naholav/llama-3.2-3b-100k-codeXGLUE-reflection
Base model
meta-llama/Llama-3.2-3B