|
--- |
|
library_name: transformers |
|
license: mit |
|
datasets: |
|
- roneneldan/TinyStories |
|
- Salesforce/wikitext |
|
- abhinand/alpaca-gpt4-sharegpt |
|
- shibing624/sharegpt_gpt4 |
|
- ChristophSchuhmann/basic-math-problems-with-step-by-step-solutions |
|
- ajibawa-2023/SlimOrca-ShareGPT |
|
- junelee/wizard_vicuna_70k |
|
- meta-math/MetaMathQA |
|
- HuggingFaceH4/MATH-500 |
|
- hkust-nlp/dart-math-pool-math |
|
- TIGER-Lab/MathInstruct |
|
language: |
|
- en |
|
pipeline_tag: text-generation |
|
--- |
|
|
|
# Arsh-llm: A Compact 500M Parameter Powerhouse 🚀 |
|
|
|
**Arsh-llm** is a 500-million-parameter language model built on the Llama architecture, designed to shine in generating creative stories, coherent text, and functional code. Pretrained for 35 hours on a T4 GPU using a curated mix of small yet powerful datasets, and fine-tuned for 15 hours on conversational data, this model is a lean, mean, text-generating machine with massive potential. With a training loss between **1.2–1.9**, it’s already showing promise and is ready to level up with more training. Buckle up—this is just the beginning! 😎 |
|
|
|
## Model Overview |
|
|
|
- **Architecture**: Llama-based causal language model |
|
- **Parameters**: 500M |
|
- **Context Length**: 128 tokens |
|
- **Pretraining Duration**: \~35 hours on NVIDIA T4 GPU |
|
- **Fine-tuning Duration**: \~15 hours on conversational datasets |
|
- **Training Loss**: 1.2–1.9 (with room to improve!) |
|
- **Library**: Transformers (Hugging Face) |
|
- **License**: MIT |
|
|
|
## Datasets |
|
|
|
Arsh-llm was trained on a diverse set of datasets to ensure versatility in storytelling, text generation, and code-related tasks: |
|
|
|
- **roneneldan/TinyStories**: Short, creative stories for narrative generation. |
|
- **Salesforce/wikitext**: Wikipedia-based text for general knowledge and coherence. |
|
- **abhinand/alpaca-gpt4-sharegpt**: Instruction-based conversational data for task-oriented responses. |
|
- **shibing624/sharegpt_gpt4**: High-quality conversational data for chat-like interactions. |
|
- **ChristophSchuhmann/basic-math-problems-with-step-by-step-solutions**: Math problems with solutions to boost logical reasoning. |
|
|
|
Fine-tuning was performed on a structured ShareGPT chat template to enhance conversational abilities, making Arsh-llm a great starting point for dialogue-based applications. |
|
|
|
## Use Cases |
|
|
|
Arsh-llm is a versatile model with applications in: |
|
|
|
- **Creative Writing**: Generate engaging short stories or narrative prompts. |
|
- **Code Generation**: Produce functional code snippets for various programming tasks. |
|
- **Conversational AI**: Power chatbots or assistants with natural dialogue. |
|
- **Educational Tools**: Assist with math problem-solving or explain concepts step-by-step. |
|
|
|
> **Note**: This model is a work in progress. For production-grade performance, further pretraining on larger datasets and post-training on conversational data is recommended. |
|
|
|
## Getting Started |
|
|
|
To use Arsh-llm, you can load it directly from Hugging Face: |
|
|
|
```python |
|
from transformers import AutoModelForCausalLM, AutoTokenizer |
|
|
|
# Load model and tokenizer |
|
model = AutoModelForCausalLM.from_pretrained("arshiaafshani/Arsh-llm") |
|
tokenizer = AutoTokenizer.from_pretrained("arshiaafshani/Arsh-llm") |
|
|
|
# Example: Generate a response |
|
messages = [{"role": "user", "content": "Write a short story about a brave robot."}] |
|
input_text = tokenizer.apply_chat_template(messages, tokenize=False) |
|
inputs = tokenizer(input_text, return_tensors="pt") |
|
outputs = model.generate(**inputs, max_length=200) |
|
print(tokenizer.decode(outputs[0], skip_special_tokens=True)) |
|
``` |
|
|
|
## Training Details |
|
|
|
- **Pretraining**: Conducted on a T4 GPU for \~35 hours using a mix of TinyStories, WikiText, and other datasets to build a strong foundation in text and story generation. |
|
- **Fine-tuning**: 15 hours on ShareGPT-based conversational data with a structured chat template to enhance dialogue capabilities. |
|
- **Hardware**: NVIDIA T4 GPU (15GB VRAM). |
|
- **Training Loss**: Achieved 1.2–1.9, indicating solid performance with significant potential for improvement through extended training. |
|
|
|
## Limitations |
|
|
|
- **Current Stage**: Arsh-llm is not yet fully optimized. It performs well for its size but requires additional training to compete with larger models. |
|
- **Dataset Size**: Pretrained on relatively small datasets, which limits its generalization. Scaling up to larger datasets will unlock its full potential. |
|
- **Context Length**: Limited to 128 tokens, which may constrain performance on longer sequences. |
|
- **Not Production-Ready**: This model is best used as a base for further fine-tuning rather than as a standalone solution. |
|
|
|
## Future Plans |
|
|
|
The journey doesn’t end here! Arsh-llm is set to evolve with: |
|
|
|
- **Extended Pretraining**: Leveraging larger datasets for broader knowledge and better generalization. |
|
- **Conversational Fine-tuning**: Enhancing dialogue capabilities with advanced post-training techniques. |
|
- **Benchmarking**: Evaluating performance against similar models (e.g., TinyLlama, Phi-1.5) on tasks like MMLU, HumanEval, and GSM8K. |
|
- **Community Feedback**: Incorporating user insights to refine and improve the model. |
|
|
|
Stay tuned—Arsh-llm is on its way to becoming a legend! 🔥 |
|
|
|
## License |
|
|
|
This model is licensed under the MIT License, allowing for flexible use in both research and commercial applications. Feel free to build upon, modify, or share it! |
|
|
|
## Acknowledgments |
|
|
|
- Built with ❤️ by Arshia Afshani. |
|
- Powered by the Hugging Face Transformers library. |
|
- Thanks to the open-source community for providing the amazing datasets that made this model possible. |
|
|
|
--- |
|
|
|
**Ready to take Arsh-llm for a spin?** Clone it, train it, and let’s make it a superstar together! 🌟 For questions, feedback, or collabs, reach out via Hugging Face or open an issue in the repo. |