Astraios: Parameter-Efficient Instruction Tuning Code Large Language Models

Model Summary
Use
Training
Citation

Model Summary

Astraios-FFT is an instruction tuned model with 15.5B parameters created by finetuning StarCoderBase on CommitPackFT & OASST as described in the Astraios paper.

Repository: bigcode-project/astraios
Paper: Astraios: Parameter-Efficient Instruction Tuning Code Large Language Models
Languages: 80+ Programming languages

✨Astraios:

Data	CommitPackFT+OASST	Filtered version of CommitPack and OASST for high-quality commit messages that resemble instructions
Model	Astraios-1B	Collection of StarCoderBase-1B models instruction tuned on CommitPackFT + OASST with different tuning methods
	Astraios-3B	Collection of StarCoderBase-3B (3B parameters) models instruction tuned on CommitPackFT + OASST with different tuning methods
	Astraios-7B	Collection of StarCoderBase-7B (7B parameters) models instruction tuned on CommitPackFT + OASST with different tuning methods
	Astraios-16B	Collection of StarCoderBase-16B (16B parameters) models instruction tuned on CommitPackFT + OASST with different tuning methods
Evaluation	BigCloneBench	Dataset for clone detection; We use 2,000 samples for evaluation
	Devign	Dataset for defect detection; We use 2,000 samples for evaluation
	HumanEvalPack	Extension of OpenAI's HumanEval to cover 3 scenarios across 6 languages
	ReCode	Dataset for the robustness of code generation, covering 4 variants
	Asleep At The Keyboard	Datasets for security of code generation; We use DoW for evaluation

Use

Intended use

The model follows instructions provided in the input. You should always preface your input with "Question: " and finish it with "Answer:", for example: "Question: Please write a function in Python that performs bubble sort.

Answer:"

Feel free to share your generations in the Community tab!

Generation

# pip install -q transformers
from transformers import AutoModelForCausalLM, AutoTokenizer

checkpoint = "bigcode/astraios-fft"
model = AutoModelForCausalLM.from_pretrained(checkpoint)
device = "cuda" # for GPU usage or "cpu" for CPU usage

tokenizer = AutoTokenizer.from_pretrained(checkpoint)
model = AutoModelForCausalLM.from_pretrained(checkpoint).to(device)

inputs = tokenizer.encode("Question: Please write a function in Python that performs bubble sort.

Answer:", return_tensors="pt").to(device)
outputs = model.generate(inputs)
print(tokenizer.decode(outputs[0]))

Training

Model

Architecture: GPT-2 model with multi-query attention and Fill-in-the-Middle objective
Steps: 250k pretraining & 200 instruction tuning
Precision: fp32

Hardware

Pretraining:
- GPUs: 512 Tesla A100
- Training time: 24 days
Instruction tuning:
- GPUs: 8 Tesla A100

Software

Orchestration: Megatron-LM/Transformers
Neural networks: PyTorch

Citation

Downloads last month: -

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

bigcode
/

astraios-fft

Astraios: Parameter-Efficient Instruction Tuning Code Large Language Models

Table of Contents