Arsh-llm: A Compact 500M Parameter Powerhouse ๐Ÿš€

Arsh-llm is a 500-million-parameter language model built on the Llama architecture, designed to shine in generating creative stories, coherent text, and functional code. Pretrained for 35 hours on a T4 GPU using a curated mix of small yet powerful datasets, and fine-tuned for 5 hours on conversational data, this model is a lean, mean, text-generating machine with massive potential. With a training loss between 1.2โ€“1.9, itโ€™s already showing promise and is ready to level up with more training. Buckle upโ€”this is just the beginning! ๐Ÿ˜Ž

Model Overview

  • Architecture: Llama-based causal language model
  • Parameters: 500M
  • Context Length: 128 tokens
  • Pretraining Duration: ~35 hours on NVIDIA T4 GPU
  • Fine-tuning Duration: ~5 hours on conversational datasets
  • Training Loss: 1.2โ€“1.9 (with room to improve!)
  • Library: Transformers (Hugging Face)
  • License: MIT

Datasets

Arsh-llm was trained on a diverse set of datasets to ensure versatility in storytelling, text generation, and code-related tasks:

  • roneneldan/TinyStories: Short, creative stories for narrative generation.
  • Salesforce/wikitext: Wikipedia-based text for general knowledge and coherence.
  • abhinand/alpaca-gpt4-sharegpt: Instruction-based conversational data for task-oriented responses.
  • shibing624/sharegpt_gpt4: High-quality conversational data for chat-like interactions.
  • ChristophSchuhmann/basic-math-problems-with-step-by-step-solutions: Math problems with solutions to boost logical reasoning.

Fine-tuning was performed on a structured ShareGPT chat template to enhance conversational abilities, making Arsh-llm a great starting point for dialogue-based applications.

Use Cases

Arsh-llm is a versatile model with applications in:

  • Creative Writing: Generate engaging short stories or narrative prompts.
  • Code Generation: Produce functional code snippets for various programming tasks.
  • Conversational AI: Power chatbots or assistants with natural dialogue.
  • Educational Tools: Assist with math problem-solving or explain concepts step-by-step.

Note: This model is a work in progress. For production-grade performance, further pretraining on larger datasets and post-training on conversational data is recommended.

Getting Started

To use Arsh-llm, you can load it directly from Hugging Face:

from transformers import AutoModelForCausalLM, AutoTokenizer

# Load model and tokenizer
model = AutoModelForCausalLM.from_pretrained("arshiaafshani/Arsh-llm")
tokenizer = AutoTokenizer.from_pretrained("arshiaafshani/Arsh-llm")

# Example: Generate a response
messages = [{"role": "user", "content": "Write a short story about a brave robot."}]
input_text = tokenizer.apply_chat_template(messages, tokenize=False)
inputs = tokenizer(input_text, return_tensors="pt")
outputs = model.generate(**inputs, max_length=200)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))

Training Details

  • Pretraining: Conducted on a T4 GPU for ~35 hours using a mix of TinyStories, WikiText, and other datasets to build a strong foundation in text and story generation.
  • Fine-tuning: 5 hours on ShareGPT-based conversational data with a structured chat template to enhance dialogue capabilities.
  • Hardware: NVIDIA T4 GPU (15GB VRAM).
  • Training Loss: Achieved 1.2โ€“1.9, indicating solid performance with significant potential for improvement through extended training.

Limitations

  • Current Stage: Arsh-llm is not yet fully optimized. It performs well for its size but requires additional training to compete with larger models.
  • Dataset Size: Pretrained on relatively small datasets, which limits its generalization. Scaling up to larger datasets will unlock its full potential.
  • Context Length: Limited to 128 tokens, which may constrain performance on longer sequences.
  • Not Production-Ready: This model is best used as a base for further fine-tuning rather than as a standalone solution.

Future Plans

The journey doesnโ€™t end here! Arsh-llm is set to evolve with:

  • Extended Pretraining: Leveraging larger datasets for broader knowledge and better generalization.
  • Conversational Fine-tuning: Enhancing dialogue capabilities with advanced post-training techniques.
  • Benchmarking: Evaluating performance against similar models (e.g., TinyLlama, Phi-1.5) on tasks like MMLU, HumanEval, and GSM8K.
  • Community Feedback: Incorporating user insights to refine and improve the model.

Stay tunedโ€”Arsh-llm is on its way to becoming a legend! ๐Ÿ”ฅ

License

This model is licensed under the MIT License, allowing for flexible use in both research and commercial applications. Feel free to build upon, modify, or share it!

Acknowledgments

  • Built with โค๏ธ by Arshia Afshani.
  • Powered by the Hugging Face Transformers library.
  • Thanks to the open-source community for providing the amazing datasets that made this model possible.

Ready to take Arsh-llm for a spin? Clone it, train it, and letโ€™s make it a superstar together! ๐ŸŒŸ For questions, feedback, or collabs, reach out via Hugging Face or open an issue in the repo.

Downloads last month
115
Safetensors
Model size
503M params
Tensor type
F32
ยท
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ 5 Ask for provider support

Datasets used to train arshiaafshani/Arsh-llm