ToastyPigeon
/

Gemma-3-Starshine-12B-Alt

Text Generation

text-generation-inference

Model card Files Files and versions

Gemma-3-Starshine-12B-Alt / README.md

ToastyPigeon's picture

Update README.md

d3b4cb0 verified 3 months ago

|

history blame contribute delete

2.63 kB

	---
	base_model:
	- allura-org/Gemma-3-Glitter-12B
	- ToastyPigeon/Gemma-3-Confetti-12B
	- google/gemma-3-12b-it
	- google/gemma-3-12b-pt
	library_name: transformers
	tags:
	- mergekit
	- merge
	---
	# 🌠G3 Starshine 12B - Alt🌠
	<figure>
	<img src="https://huggingface.co/ToastyPigeon/Gemma-3-Starshine-12B/resolve/main/modelcard_image.jpeg" width="600">
	</figure>

	This was Merge A2 in the testing set.

	A creative writing model based on a merge of fine-tunes on Gemma 3 12B IT and Gemma 3 12B PT.

	This is the RP Focused merge. This version better handles separate characters in turn-based chats with less impersonation.

	See the main [Story Focused](https://huggingface.co/ToastyPigeon/Gemma-3-Starshine-12B) version as well.

	This is a merge of two G3 models, one trained on instruct and one trained on base:
	* [allura-org/Gemma-3-Glitter-12B](https://huggingface.co/allura-org/Gemma-3-Glitter-12B) - Itself a merge of a storywriting and RP train (both also by ToastyPigeon), on instruct
	* [ToastyPigeon/Gemma-3-Confetti-12B](https://huggingface.co/ToastyPigeon/Gemma-3-Confetti-12B) - Experimental application of the Glitter data using base instead of instruct, additionally includes some adventure data in the form of SpringDragon.

	The result is a lovely blend of Glitter's ability to follow instructions and Confetti's free-spirit prose, effectively 'loosening up' much of the hesitancy that was left in Glitter.

	Vision works (as well as any vision works with this model right now) if you pair a GGUF of this with an appropriate mmproj file; I intend to fix the missing vision tower + make this properly multimodal in the near future.

	Thank you to [jebcarter](https://huggingface.co/jebcarter) for the idea to make this. I love how it turned out!

	## Instruct Format

	Uses Gemma2/3 instruct, but has been trained to recognize an optional system role.

	Note: While it won't immediately balk at the system role, results may be better without it.

	```
	<start_of_turn>system
	{optional system turn with prompt}<end_of_turn>
	<start_of_turn>user
	{User messages; can also put sysprompt here to use the built-in g3 training}<end_of_turn>
	<start_of_turn>model
	{model response}<end_of_turn>
	```

	### Merge Configuration

	A higher percentage of Glitter gives this model better turn-based instruct following, but it may be more uptight compared to the Story Focused version.

	```yaml
	models:
	- model: ToastyPigeon/Gemma-3-Confetti-12B
	parameters:
	weight: 0.3
	- model: allura-org/Gemma-3-Glitter-12B
	parameters:
	weight: 0.7
	merge_method: linear
	tokenizer_source: allura-org/Gemma-3-Glitter-12B

	```