motexture
/

SmolLCoder-1.7B-Instruct

Text Generation

Model card Files Files and versions Community

SmolLCoder-1.7B-Instruct / README.md

motexture's picture

Update README.md

d9b7eb7 verified 8 months ago

|

history blame contribute delete

1.97 kB

	---
	license: apache-2.0
	datasets:
	- motexture/cData
	language:
	- en
	base_model:
	- HuggingFaceTB/SmolLM2-1.7B-Instruct
	pipeline_tag: text-generation
	tags:
	- smoll
	- coding
	- coder
	- model
	- small
	---

	# SmolLCoder-1.7B-Instruct

	## Introduction

	SmolLCoder-1.7B-Instruct is a fine-tuned version of SmolLM2-1.7B-Instruct, trained on the cData coding dataset.

	## Quickstart

	Here provides a code snippet with `apply_chat_template` to show you how to load the tokenizer and model and how to generate contents.

	```python
	from transformers import AutoModelForCausalLM, AutoTokenizer
	device = "cuda" # the device to load the model onto

	model = AutoModelForCausalLM.from_pretrained(
	"motexture/SmolLCoder-1.7B-Instruct",
	torch_dtype="auto",
	device_map="auto"
	)
	tokenizer = AutoTokenizer.from_pretrained("motexture/SmolLCoder-1.7B-Instruct")

	prompt = "Write a C++ program that prints Hello World!"
	messages = [
	{"role": "system", "content": "You are a helpful assistant."},
	{"role": "user", "content": prompt}
	]
	text = tokenizer.apply_chat_template(
	messages,
	tokenize=False,
	add_generation_prompt=True
	)
	model_inputs = tokenizer([text], return_tensors="pt").to(device)

	generated_ids = model.generate(
	model_inputs.input_ids,
	max_new_tokens=4096,
	do_sample=True,
	temperature=0.3
	)
	generated_ids = [
	output_ids[len(input_ids):] for input_ids, output_ids in zip(model_inputs.input_ids, generated_ids)
	]

	response = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0]
	```

	## License

	[Apache 2.0](https://www.apache.org/licenses/LICENSE-2.0)

	## Citation
	```bash
	@misc{allal2024SmolLM2,
	title={SmolLM2 - with great data, comes great performance},
	author={Loubna Ben Allal and Anton Lozhkov and Elie Bakouch and Gabriel Martín Blázquez and Lewis Tunstall and Agustín Piqueres and Andres Marafioti and Cyril Zakka and Leandro von Werra and Thomas Wolf},
	year={2024},
	}
	```