Instructions to use ArmelR/starcoder-gradio-v0 with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use ArmelR/starcoder-gradio-v0 with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="ArmelR/starcoder-gradio-v0")

# Load model directly
from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("ArmelR/starcoder-gradio-v0")
model = AutoModelForCausalLM.from_pretrained("ArmelR/starcoder-gradio-v0")

Notebooks
Google Colab
Kaggle
Local Apps Settings

vLLM

How to use ArmelR/starcoder-gradio-v0 with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "ArmelR/starcoder-gradio-v0"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "ArmelR/starcoder-gradio-v0",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Use Docker

docker model run hf.co/ArmelR/starcoder-gradio-v0

SGLang

How to use ArmelR/starcoder-gradio-v0 with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "ArmelR/starcoder-gradio-v0" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "ArmelR/starcoder-gradio-v0",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "ArmelR/starcoder-gradio-v0" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "ArmelR/starcoder-gradio-v0",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Docker Model Runner
How to use ArmelR/starcoder-gradio-v0 with Docker Model Runner:
```
docker model run hf.co/ArmelR/starcoder-gradio-v0
```

Description

This language model is the version 0.0 of a Gradio Coding Assistant. It is an instruction fine-tuned version of StarCoder that is designed to provide assistance to developers who use gradio.

Dataset

The dataset is multi-source. Its content comes from the following sources

The stack

More precisely, we looked into the-stack-dedup which contain codes permissive licenses. We shortlisted the files whose content incorporated the keyword gradio.

GitHub Issues

We scrapped all the issues of the official repository the-gradio-app/gradio and added them to our training dataset.

Spaces on Hugging Face Hub

We used the HuggingFace_Hub API to scrape the data from the spaces which are designed with gradio. We kept track of those with permissive licenses, namely MIT and Apache 2.0. This set of code was further deduplicated.

Training setting and hyperparameters

For our fine-tuning, we decided to follow a 2-step strategy.

Pretraining (Fine-tuning) with next token prediction on the previously built gradio dataset (this step should familiarize the model with the gradio syntax.).
Instruction fine-tuning on an instruction dataset (this step should make the model conversational.). For both steps, we made use of parameter-efficient fine-tuning via the library PEFT, more precisely LoRA. Our training script is the famous starcoder fine-tuning script.

Resources

Our training was done of 8 A100 GPUs of 80GB.

Pretraining

These are the parameters that we used :

learning rate : 5e-4
warmup_steps :
gradient_accumulation_steps : 4
batch_size : 1
sequence length : 2048
max_steps : 1000
warmup_steps : 5
weight_decay : 0.05
learning rate scheduler : cosine

LORA PARAMETERS :

r = 16
alpha = 32
dropout = 0.05

We stopped the training before the end and kept the checkpoint-100 for the second step.

Fine-tuning

This step consisted into the instruction fine-tuning of the previous checkpoint. For that purpose, we used a modified version of openassistant-guanaco. The template for the instruction fine-tuning was Question: {question}\n\nAnswer: {answer}. We used exactly the same parameters we used during the pretraining and we kept the checkpoint-50.

Usage

The usage is straightforward and very similar to any other instruction fine-tuned model.

from transformers import AutoModelForCausalLM, AutoTokenizer

checkpoint_name="ArmelR/starcoder-gradio-v0"
model = AutoModelForCausalLM.from_pretrained(checkpoint_name)
tokenizer = AutoTokenizer.from_pretrained(checkpoint_name)

prompt = "Create a gradio application that help to convert temperature in celcius into temperature in Fahrenheit"
inputs = tokenizer(f"Question: {prompt}\n\nAnswer: ", return_tensors="pt")

outputs = model.generate(
  inputs["input_ids"],
  temperature=0.2,
  top_p=0.95,
  max_new_tokens=200
)

input_len=len(inputs["input_ids"])
print(tokenizer.decode(outputs[0][input_len:]))

Updates

Gradio dataset .filter(lambda x : ("gradio" in x["content"] or "gr." in x["content"]) and "streamlit" not in x["content"]) Guanaco ArmelR/oasst1_guanaco

StarCoderbase (950, 1350)
- max_steps = 2000
- shuffle_buffer = 100
- batch_size = 2
- gradient_accumulation_steps = 4
- num_warmup_steps = 100
- weight_decay = 0.01
StarCoderplus (2000)

Guanaco multi-turn (HuggingFaceH4/oasst1_en)

More information

For further information, refer to StarCoder.

Downloads last month: 8

Spaces using ArmelR/starcoder-gradio-v0 2

Paper for ArmelR/starcoder-gradio-v0

LoRA: Low-Rank Adaptation of Large Language Models

Paper • 2106.09685 • Published Jun 17, 2021 • 63