LLaMA 3.2 3B - Java Code Generation (Reflection)

This model is a fine-tuned version of meta-llama/Llama-3.2-3B specifically trained for Java method generation using a novel reflection-based meta-learning approach.

Model Description

  • Base Model: LLaMA 3.2 3B
  • Training Method: Reflection-based Meta-Learning
  • Task: Java method generation from natural language descriptions
  • Training Data: 100k examples from CodeXGLUE dataset with Claude annotations
  • Language: Java
  • License: LLaMA 3.2 Community License

Training Details

Dataset

Trained on Naholav/llama3.2-java-codegen-90sft-10meta-claude-v1:

  • 90,000 SFT examples for standard training
  • 10,000 meta-annotated examples with Claude's error analysis and learning insights
  • Source: CodeXGLUE text-to-code (Java) dataset

Reflection-Based Training

This model uses a unique teacher-student reflection paradigm:

  • Teacher: Claude 4 Sonnet provides error analysis and guidance
  • Student: LLaMA 3.2 3B learns from its mistakes through structured reflection
  • Meta examples include error analysis and learning insights for deeper understanding

Training Configuration

  • Epochs: 3
  • Batch Size: 8 × 6 gradient accumulation = 48 effective
  • Learning Rate: 2e-5
  • Max Length: 2048 tokens
  • Precision: float32 (for stability)
  • Optimizer: AdamW
  • Scheduler: Cosine with warmup
  • Early Stopping: Dual tracking (SFT and Meta losses)

Hardware

  • GPU: NVIDIA A100 80GB
  • Training Time: ~9 hours
  • Framework: PyTorch 2.0+ with Transformers

Usage

Installation

pip install transformers torch

Quick Start

from transformers import AutoTokenizer, AutoModelForCausalLM

# Load model and tokenizer
model_name = "Naholav/llama-3.2-3b-100k-codeXGLUE-reflection"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(model_name, torch_dtype=torch.float16, device_map="auto")

# Prepare prompt
task_description = "returns the sum of two integers"
prompt = f"""You are an expert Java programmer. Generate a complete, working Java method for the given description.

Task: {task_description}

Requirements:
- Write a complete Java method
- Use proper syntax and naming conventions
- Include return statements where needed
- Keep it concise but functional

```java
"""

# Generate code
inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
outputs = model.generate(
    **inputs,
    max_new_tokens=150,
    temperature=0.2,
    do_sample=True,
    top_p=0.95,
    pad_token_id=tokenizer.eos_token_id
)

generated_code = tokenizer.decode(outputs[0], skip_special_tokens=True)
print(generated_code)

Expected Output Format

The model generates Java methods following this pattern:

public int sum(int a, int b) {
    return a + b;
}

Testing on Your Own Data

For local evaluation, you can use:

Important: Remember to clean the natural language descriptions before inference:

def clean_nl(nl_description):
    cleaned = nl_description.replace("concode_field_sep", " | ")
    cleaned = cleaned.replace("concode_elem_sep", ", ")
    return ' '.join(cleaned.split())

Performance

The model was evaluated during training with:

  • Separate tracking of SFT and Meta losses
  • 5 evaluations per epoch
  • Dual early stopping based on both loss types
  • Best checkpoint selected based on average validation loss

Reflection Training Methodology

This model was trained using a novel approach where:

  1. Error Recognition: Model learns to identify common coding mistakes
  2. Pattern Analysis: Understands method signatures and class structures
  3. Knowledge Gaps: Recognizes missing OOP concepts
  4. Improvement Strategy: Internalizes better coding patterns

Meta examples included structured reflection prompts with:

  • Student's incorrect attempt
  • Teacher's correct implementation
  • Detailed error analysis
  • Learning insights and guidance

Comparison with SFT Model

This is the reflection-based version. For comparison with standard supervised fine-tuning:

Limitations

  • Trained specifically for Java method generation
  • May not generalize well to full classes or other programming languages
  • Best suited for single-method generation tasks
  • Context window limited to 2048 tokens

Ethical Considerations

  • The model should not be used to generate malicious code
  • Generated code should be reviewed before use in production
  • Not suitable for generating code that handles sensitive data without proper review

Key Differences from SFT Model

  • Training Data: Uses same dataset but processes meta examples differently
  • Learning Paradigm: Teacher-student reflection vs direct imitation
  • Loss Tracking: Dual tracking of SFT and Meta losses
  • Expected Benefit: Better understanding of coding patterns and error avoidance

Acknowledgments

  • Meta AI for the LLaMA 3.2 base model
  • Microsoft Research for the CodeXGLUE text-to-code (Java) dataset
  • Anthropic for Claude 4 Sonnet's error analysis and insights
  • Hugging Face for the training infrastructure
Downloads last month
48
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for Naholav/llama-3.2-3b-100k-codeXGLUE-reflection

Finetuned
(236)
this model

Dataset used to train Naholav/llama-3.2-3b-100k-codeXGLUE-reflection