File size: 5,724 Bytes
577b4bf 99d5023 577b4bf 9e6710e 99d5023 e3d19eb 99d5023 e3d19eb 99d5023 e3d19eb 99d5023 e3d19eb 99d5023 e3d19eb 99d5023 e3d19eb 99d5023 9e6710e 99d5023 9e6710e 99d5023 9e6710e 99d5023 9e6710e 99d5023 9e6710e 99d5023 9e6710e 99d5023 8e23a31 99d5023 ee1c870 99d5023 ee1c870 99d5023 ee1c870 99d5023 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 |
---
library_name: transformers
license: mit
datasets:
- roneneldan/TinyStories
- Salesforce/wikitext
- abhinand/alpaca-gpt4-sharegpt
- shibing624/sharegpt_gpt4
- ChristophSchuhmann/basic-math-problems-with-step-by-step-solutions
- ajibawa-2023/SlimOrca-ShareGPT
- junelee/wizard_vicuna_70k
- meta-math/MetaMathQA
- HuggingFaceH4/MATH-500
- hkust-nlp/dart-math-pool-math
- TIGER-Lab/MathInstruct
language:
- en
pipeline_tag: text-generation
---
# Arsh-llm: A Compact 500M Parameter Powerhouse 🚀
**Arsh-llm** is a 500-million-parameter language model built on the Llama architecture, designed to shine in generating creative stories, coherent text, and functional code. Pretrained for 35 hours on a T4 GPU using a curated mix of small yet powerful datasets, and fine-tuned for 15 hours on conversational data, this model is a lean, mean, text-generating machine with massive potential. With a training loss between **1.2–1.9**, it’s already showing promise and is ready to level up with more training. Buckle up—this is just the beginning! 😎
## Model Overview
- **Architecture**: Llama-based causal language model
- **Parameters**: 500M
- **Context Length**: 128 tokens
- **Pretraining Duration**: \~35 hours on NVIDIA T4 GPU
- **Fine-tuning Duration**: \~15 hours on conversational datasets
- **Training Loss**: 1.2–1.9 (with room to improve!)
- **Library**: Transformers (Hugging Face)
- **License**: MIT
## Datasets
Arsh-llm was trained on a diverse set of datasets to ensure versatility in storytelling, text generation, and code-related tasks:
- **roneneldan/TinyStories**: Short, creative stories for narrative generation.
- **Salesforce/wikitext**: Wikipedia-based text for general knowledge and coherence.
- **abhinand/alpaca-gpt4-sharegpt**: Instruction-based conversational data for task-oriented responses.
- **shibing624/sharegpt_gpt4**: High-quality conversational data for chat-like interactions.
- **ChristophSchuhmann/basic-math-problems-with-step-by-step-solutions**: Math problems with solutions to boost logical reasoning.
Fine-tuning was performed on a structured ShareGPT chat template to enhance conversational abilities, making Arsh-llm a great starting point for dialogue-based applications.
## Use Cases
Arsh-llm is a versatile model with applications in:
- **Creative Writing**: Generate engaging short stories or narrative prompts.
- **Code Generation**: Produce functional code snippets for various programming tasks.
- **Conversational AI**: Power chatbots or assistants with natural dialogue.
- **Educational Tools**: Assist with math problem-solving or explain concepts step-by-step.
> **Note**: This model is a work in progress. For production-grade performance, further pretraining on larger datasets and post-training on conversational data is recommended.
## Getting Started
To use Arsh-llm, you can load it directly from Hugging Face:
```python
from transformers import AutoModelForCausalLM, AutoTokenizer
# Load model and tokenizer
model = AutoModelForCausalLM.from_pretrained("arshiaafshani/Arsh-llm")
tokenizer = AutoTokenizer.from_pretrained("arshiaafshani/Arsh-llm")
# Example: Generate a response
messages = [{"role": "user", "content": "Write a short story about a brave robot."}]
input_text = tokenizer.apply_chat_template(messages, tokenize=False)
inputs = tokenizer(input_text, return_tensors="pt")
outputs = model.generate(**inputs, max_length=200)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))
```
## Training Details
- **Pretraining**: Conducted on a T4 GPU for \~35 hours using a mix of TinyStories, WikiText, and other datasets to build a strong foundation in text and story generation.
- **Fine-tuning**: 15 hours on ShareGPT-based conversational data with a structured chat template to enhance dialogue capabilities.
- **Hardware**: NVIDIA T4 GPU (15GB VRAM).
- **Training Loss**: Achieved 1.2–1.9, indicating solid performance with significant potential for improvement through extended training.
## Limitations
- **Current Stage**: Arsh-llm is not yet fully optimized. It performs well for its size but requires additional training to compete with larger models.
- **Dataset Size**: Pretrained on relatively small datasets, which limits its generalization. Scaling up to larger datasets will unlock its full potential.
- **Context Length**: Limited to 128 tokens, which may constrain performance on longer sequences.
- **Not Production-Ready**: This model is best used as a base for further fine-tuning rather than as a standalone solution.
## Future Plans
The journey doesn’t end here! Arsh-llm is set to evolve with:
- **Extended Pretraining**: Leveraging larger datasets for broader knowledge and better generalization.
- **Conversational Fine-tuning**: Enhancing dialogue capabilities with advanced post-training techniques.
- **Benchmarking**: Evaluating performance against similar models (e.g., TinyLlama, Phi-1.5) on tasks like MMLU, HumanEval, and GSM8K.
- **Community Feedback**: Incorporating user insights to refine and improve the model.
Stay tuned—Arsh-llm is on its way to becoming a legend! 🔥
## License
This model is licensed under the MIT License, allowing for flexible use in both research and commercial applications. Feel free to build upon, modify, or share it!
## Acknowledgments
- Built with ❤️ by Arshia Afshani.
- Powered by the Hugging Face Transformers library.
- Thanks to the open-source community for providing the amazing datasets that made this model possible.
---
**Ready to take Arsh-llm for a spin?** Clone it, train it, and let’s make it a superstar together! 🌟 For questions, feedback, or collabs, reach out via Hugging Face or open an issue in the repo. |