Text Generation
English
research
leaf / README.md
boyuia's picture
Added metadata.yml
9624ac3 verified
---
license: mit
datasets:
- deepseek-ai/DeepSeek-ProverBench
language:
- en
metrics:
- character
pipeline_tag: text-generation
tags:
- research
---
# Leaf
An open source "prototype" AI model used for AI research.
## About this project
Leaf is an "experimental" AI model, utilising PyTorch.
## Research
With leaf we've been testing many capabilities of what AI could do.
Starting with a simple "embedded" python dataset, leaf uses only 2700 steps for training (the more steps, the better it learns).
**Training Data:** `
{"this is a much longer text that will serve as a simple dataset for our tiny language model. The model will learn to predict the next character based on the previous characters in the sequence."}
{"text": "This demonstrates the core idea behind training an autoregressive language model. The quick brown fox jumps over the lazy dog."}
{"text": "A journey of a thousand miles begins with a single step. The early bird catches the worm. All that glitters is not gold. A stitch in time saves nine."}
{"text": "Where there's a will, there's a way. Look before you leap. You can't make an omelette without breaking a few eggs. Practice makes perfect. Don't count your chickens before they hatch."}`
However this result came with the following output:
`text that will serve`
Then we used JSONL databases from the community, and unfortunatly this was the output:
`rimetricE7tich then`