BEE-spoke-data
/

tFINE-900m-e16-d32-flan

Text2Text Generation

text-generation-inference

Model card Files Files and versions

tFINE-900m-e16-d32-flan / README.md

pszemraj's picture

Update README.md

d9ffec9 verified 9 months ago

|

history blame contribute delete

2.83 kB

	---
	library_name: transformers
	license: apache-2.0
	datasets:
	- pszemraj/flan-subsets-deduped
	language:
	- en
	base_model: pszemraj/tFINE-900m-e16-d32-1024ctx
	pipeline_tag: text2text-generation
	---

	# BEE-spoke-data/tFINE-900m-e16-d32-flan

	This is a basic text-to-text "instruct" model, similar to Google's original [flan-t5](https://huggingface.co/collections/google/flan-t5-release-65005c39e3201fff885e22fb) model series (but not trained for as long).


	<details>
	<summary>Details: Click here to expand</summary>

	Fine-tuned from [the base model](https://hf.co/pszemraj/tFINE-900m-e16-d32-1024ctx) on the `pszemraj/flan-subsets-deduped` dataset, subset `flan-v2` for 1 epoch. It achieves the following results on the evaluation set:
	- Loss: 1.4134
	- Rouge1: 62.9142
	- Rouge2: 22.5279
	- Rougel: 61.4902
	- Rougelsum: 61.7795
	- Gen Len: 12.0586
	- Num Input Tokens Seen: 1931815668

	### Model features

	- pretrained & fine-tuned at 1024 context length (input)
	- tokenizer with byte-pair fallback to support understanding and generating text beyond what the original T5 tokenizer does

	</details>

	## Usage Example

	```py
	from transformers import pipeline

	pipe = pipeline(
	"text2text-generation",
	model="BEE-spoke-data/tFINE-900m-e16-d32-flan",
	)
	prompt = "What color is tuesday?"
	res = pipe(prompt, max_new_tokens=96, top_k=4, penalty_alpha=0.6)
	print(res[0]["generated_text"])
	```

	## Quick eval

	Quick eval for: `BEE-spoke-data/tFINE-900m-e16-d32-flan`


	hf (pretrained=BEE-spoke-data/tFINE-900m-e16-d32-flan,trust_remote_code=True,dtype=bfloat16,trust_remote_code=True), gen_kwargs: (None), limit: None, num_fewshot: None, batch_size: 8
	\| Tasks \|Version\| Filter \|n-shot\| Metric \| \|Value \| \|Stderr\|
	\|-------------\|------:\|----------------\|-----:\|-----------\|---\|-----:\|---\|------\|
	\|boolq \| 2\|none \| 0\|acc \|↑ \|0.6700\|± \|0.0082\|
	\|openbookqa \| 1\|none \| 0\|acc \|↑ \|0.1900\|± \|0.0176\|
	\| \| \|none \| 0\|acc_norm \|↑ \|0.2980\|± \|0.0205\|
	\|piqa \| 1\|none \| 0\|acc \|↑ \|0.6001\|± \|0.0114\|
	\| \| \|none \| 0\|acc_norm \|↑ \|0.6072\|± \|0.0114\|
	\|social_iqa \| 0\|none \| 0\|acc \|↑ \|0.4299\|± \|0.0112\|
	\|tinyArc \| 0\|none \| 25\|acc_norm \|↑ \|0.3214\|± \| N/A\|
	\|tinyGSM8k \| 0\|flexible-extract\| 5\|exact_match\|↑ \|0.0492\|± \| N/A\|
	\| \| \|strict-match \| 5\|exact_match\|↑ \|0.0380\|± \| N/A\|
	\|tinyHellaswag\| 0\|none \| 10\|acc_norm \|↑ \|0.4005\|± \| N/A\|
	\|tinyMMLU \| 0\|none \| 0\|acc_norm \|↑ \|0.2857\|± \| N/A\|
	\|winogrande \| 1\|none \| 0\|acc \|↑ \|0.4988\|± \|0.0141\|