LLaMA 3.2 3B - Java Code Generation (Reflection)

This model is a fine-tuned version of meta-llama/Llama-3.2-3B specifically trained for Java method generation using a novel reflection-based meta-learning approach.

Model Description

Base Model: LLaMA 3.2 3B
Training Method: Reflection-based Meta-Learning
Task: Java method generation from natural language descriptions
Training Data: 100k examples from CodeXGLUE dataset with Claude annotations
Language: Java
License: LLaMA 3.2 Community License

Training Details

Dataset

Trained on Naholav/llama3.2-java-codegen-90sft-10meta-claude-v1:

90,000 SFT examples for standard training
10,000 meta-annotated examples with Claude's error analysis and learning insights
Source: CodeXGLUE text-to-code (Java) dataset

Reflection-Based Training

This model uses a unique teacher-student reflection paradigm:

Teacher: Claude 4 Sonnet provides error analysis and guidance
Student: LLaMA 3.2 3B learns from its mistakes through structured reflection
Meta examples include error analysis and learning insights for deeper understanding

Training Configuration

Epochs: 3
Batch Size: 8 × 6 gradient accumulation = 48 effective
Learning Rate: 2e-5
Max Length: 2048 tokens
Precision: float32 (for stability)
Optimizer: AdamW
Scheduler: Cosine with warmup
Early Stopping: Dual tracking (SFT and Meta losses)

Hardware

GPU: NVIDIA A100 80GB
Training Time: ~9 hours
Framework: PyTorch 2.0+ with Transformers

Usage

Installation

pip install transformers torch

Quick Start

from transformers import AutoTokenizer, AutoModelForCausalLM

# Load model and tokenizer
model_name = "Naholav/llama-3.2-3b-100k-codeXGLUE-reflection"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(model_name, torch_dtype=torch.float16, device_map="auto")

# Prepare prompt
task_description = "returns the sum of two integers"
prompt = f"""You are an expert Java programmer. Generate a complete, working Java method for the given description.

Task: {task_description}

Requirements:
- Write a complete Java method
- Use proper syntax and naming conventions
- Include return statements where needed
- Keep it concise but functional

```java
"""

# Generate code
inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
outputs = model.generate(
    **inputs,
    max_new_tokens=150,
    temperature=0.2,
    do_sample=True,
    top_p=0.95,
    pad_token_id=tokenizer.eos_token_id
)

generated_code = tokenizer.decode(outputs[0], skip_special_tokens=True)
print(generated_code)

Expected Output Format

The model generates Java methods following this pattern:

public int sum(int a, int b) {
    return a + b;
}

Testing on Your Own Data

For local evaluation, you can use:

Test dataset from this project: 100 examples
Original Microsoft test set: 2k examples

Important: Remember to clean the natural language descriptions before inference:

def clean_nl(nl_description):
    cleaned = nl_description.replace("concode_field_sep", " | ")
    cleaned = cleaned.replace("concode_elem_sep", ", ")
    return ' '.join(cleaned.split())

Performance

The model was evaluated during training with:

Separate tracking of SFT and Meta losses
5 evaluations per epoch
Dual early stopping based on both loss types
Best checkpoint selected based on average validation loss

Reflection Training Methodology

This model was trained using a novel approach where:

Error Recognition: Model learns to identify common coding mistakes
Pattern Analysis: Understands method signatures and class structures
Knowledge Gaps: Recognizes missing OOP concepts
Improvement Strategy: Internalizes better coding patterns

Meta examples included structured reflection prompts with:

Student's incorrect attempt
Teacher's correct implementation
Detailed error analysis
Learning insights and guidance

Comparison with SFT Model

This is the reflection-based version. For comparison with standard supervised fine-tuning:

SFT Model
GitHub Repository for implementation details

Limitations

Trained specifically for Java method generation
May not generalize well to full classes or other programming languages
Best suited for single-method generation tasks
Context window limited to 2048 tokens

Ethical Considerations

The model should not be used to generate malicious code
Generated code should be reviewed before use in production
Not suitable for generating code that handles sensitive data without proper review

Key Differences from SFT Model

Training Data: Uses same dataset but processes meta examples differently
Learning Paradigm: Teacher-student reflection vs direct imitation
Loss Tracking: Dual tracking of SFT and Meta losses
Expected Benefit: Better understanding of coding patterns and error avoidance

Acknowledgments

Meta AI for the LLaMA 3.2 base model
Microsoft Research for the CodeXGLUE text-to-code (Java) dataset
Anthropic for Claude 4 Sonnet's error analysis and insights
Hugging Face for the training infrastructure

Naholav
/

llama-3.2-3b-100k-codeXGLUE-reflection