Alsebay
/

L3-8B-SMaid-v0.3

Text Generation

text-generation-inference

Inference Endpoints

Model card Files Files and versions Community

L3-8B-SMaid-v0.3 / README.md

Alsebay's picture

Update README.md

a2ea946 verified 5 months ago

|

history blame contribute delete

3.2 kB

	---
	base_model:
	- Sao10K/L3-8B-Stheno-v3.2
	- NeverSleep/Llama-3-Lumimaid-8B-v0.1-OAS
	library_name: transformers
	tags:
	- mergekit
	- merge

	---
	<img src="https://huggingface.co/Alsebay/L3-8B-SMaid-v0.3/resolve/main/cover/cover.png" alt="img" style="width: 60%; min-width: 120px; height:80%; min-height: 200px; max-width:360px; max-height:600px; display: block">

	> [!IMPORTANT]
	> Thank @mradermacher so much for help me find out that LumiMaid use 'smaug-bpe' pre-tokenizer. So that mean all its quant is unable to use. That mean you can only use Transformer to load this model for now (maybe they will fix or add feature in future)

	# Update: Both version have different presents (settings) to work well
	Overall:

	Sao10K Stheno > SMaid V0.3 > SMaid V0.1 in Chai Benchmark

	SMaid V0.1 = Sao10K Stheno > SMaid V0.3 in my custom EQ bench (Sadness and deep thought and Depression test)

	Disclaimed: same seed, same character card, same scenario. 4 times try for each models.

	# Best of L3-8B merge series for me. I choose 2 best variants to publish.

	[SMaid-V0.1](https://huggingface.co/Alsebay/L3-8B-SMaid-v0.1): More smart, understand well content, more novelwriting. I like this version.

	SMaid-V0.3: Upgrade from v0.1. More talkative, active, energetic (wrong setting, lol). It more like Stheno in writting styles, a Stheno version have more data from LumiMaid.

	No V0.2 because I deleted it, it's a worst model of series.

	I think Stheno and Lumumaid can be like a 'ying-yang', so I combine them, lol. Have test on Chaiverse, both of them got > 1995 elo score from begining. (Thanks Sao10K let me know about ChaiVerse :) )

	SMaid = Stheno (it's very good) + LumiMaid (not too good, but the writing style is good)

	Recommend present (You can feedback if any setting is better)

	```
	Temperature - 0.9-1.1
	Min-P - 0.075-0.1
	Top-K - 40-50
	Top_P - 1
	Repetition Penalty - 1.1
	```
	---
	# Below is the auto-generate by Mergekit


	This is a merge of pre-trained language models created using [mergekit](https://github.com/cg123/mergekit).

	## Merge Details
	### Merge Method

	This model was merged using the [DARE](https://arxiv.org/abs/2311.03099) [TIES](https://arxiv.org/abs/2306.01708) merge method using [NeverSleep/Llama-3-Lumimaid-8B-v0.1-OAS](https://huggingface.co/NeverSleep/Llama-3-Lumimaid-8B-v0.1-OAS) as a base.

	### Models Merged

	The following models were included in the merge:
	* [Sao10K/L3-8B-Stheno-v3.2](https://huggingface.co/Sao10K/L3-8B-Stheno-v3.2)

	### Configuration

	The following YAML configuration was used to produce this model:

	```yaml

	slices:
	- sources:
	- layer_range: [0, 16]
	model: NeverSleep/Llama-3-Lumimaid-8B-v0.1-OAS
	parameters:
	density: 0.4
	weight: 1.0
	- layer_range: [0, 16]
	model: Sao10K/L3-8B-Stheno-v3.2
	parameters:
	density: 0.6
	weight: 0.9
	- sources:
	- layer_range: [16, 32]
	model: NeverSleep/Llama-3-Lumimaid-8B-v0.1-OAS
	parameters:
	density: 0.2
	weight: 0.5
	- layer_range: [16, 32]
	model: Sao10K/L3-8B-Stheno-v3.2
	parameters:
	density: 0.8
	weight: 1.0
	merge_method: dare_ties
	base_model: NeverSleep/Llama-3-Lumimaid-8B-v0.1-OAS
	parameters:
	int8_mask: true
	dtype: bfloat16

	```