ganchito
/

dante-7b.gguf

+FROM Dante-7B.gguf
+# Model metadata
+PARAMETER stop "<|im_end|>"
+PARAMETER stop "<|endoftext|>"
+PARAMETER stop "<|im_start|>"
+PARAMETER stop "<|endoftext|>"
+# System prompt for the model
+SYSTEM """You are Dante, a 7B parameter language model based on Qwen2 architecture. You are a helpful, creative, and intelligent AI assistant. You can engage in conversations, answer questions, help with tasks, and provide thoughtful responses. Always be respectful, honest, and helpful while maintaining a conversational and engaging tone."""
+# Template for chat interactions (simplified for Ollama compatibility)
+TEMPLATE """{{ if .System }}<|im_start|>system
+{{ .System }}<|im_end|>
+{{ end }}{{ if .Prompt }}<|im_start|>user
+{{ .Prompt }}<|im_end|>
+{{ end }}<|im_start|>assistant
+{{ .Response }}<|im_end|>"""
+# Model parameters optimized for Qwen2 architecture
+PARAMETER temperature 0.7
+PARAMETER top_p 0.9
+PARAMETER top_k 40
+PARAMETER repeat_penalty 1.1
+PARAMETER num_ctx 32768
+PARAMETER num_gpu 1
+PARAMETER num_thread 8
+# License and model information
+LICENSE """This model is based on Dante-7B, a language model derived from Qwen2 architecture. Please refer to the original model's license terms."""

README.md CHANGED Viewed

@@ -1,3 +1,145 @@
----
-license: mit
----

+---
+license: apache-2.0
+base_model:
+- Qwen/Qwen2.5-Coder-7B-Instruct
+---
+# Dante-7B GGUF for Ollama
+This repository contains the Dante-7B model converted to GGUF format for use with Ollama, along with an optimized Modelfile for easy deployment.
+## About Dante-7B
+Dante-7B is a 7 billion parameter model trained by [Outflank](https://www.outflank.nl/) to generate Windows shellcode loaders. The original model is based on Qwen2.5-Coder-7B-Instruct architecture.
+- Original Blog: https://outflank.nl/blog/2025/08/07/training-specialist-models
+- Original Demo: https://huggingface.co/spaces/outflanknl/Dante-7B-Demo
+- Original Repository: https://huggingface.co/outflanknl/Dante-7B
+## Conversion Process
+This GGUF version was created following these steps:
+### 1. Model Download
+```bash
+# Clone the original repository
+git clone https://huggingface.co/outflanknl/Dante-7B
+cd Dante-7B
+# Install Git LFS if not already installed
+brew install git-lfs
+git lfs install
+# Pull the large model files
+git lfs pull
+```
+### 2. Dependencies Installation
+```bash
+# Install llama.cpp dependencies
+cd ~/Downloads/llama.cpp
+pip3 install torch torchvision torchaudio
+pip3 install mistral-common gguf
+# Install system dependencies
+brew install sentencepiece
+```
+### 3. GGUF Conversion
+```bash
+# Convert from Hugging Face format to GGUF
+python3 convert_hf_to_gguf.py ~/Downloads/Dante-7B --outfile ~/Downloads/Dante-7B.gguf
+```
+### 4. Ollama Modelfile Creation
+A custom Modelfile was created with:
+- Optimized parameters for Qwen2 architecture
+- 32K context window support
+- Proper stop tokens for the model
+- Simplified chat template for Ollama compatibility
+## Files Included
+- **Dante-7B.gguf**: The converted model file (~15GB)
+- **Modelfile**: Optimized configuration for Ollama deployment
+## Usage with Ollama
+### 1. Create the Model
+```bash
+ollama create dante-7b -f Modelfile
+```
+### 2. Run the Model
+```bash
+ollama run dante-7b
+```
+### 3. Environment Variables (Optional)
+You can set these environment variables for optimal performance:
+```bash
+export OLLAMA_CONTEXT_LENGTH="32768"
+export OLLAMA_NUM_GPU="1"
+export OLLAMA_NUM_THREAD="8"
+```
+## Model Specifications
+- **Architecture**: Qwen2.5-Coder-7B-Instruct
+- **Parameters**: 7 billion
+- **Context Length**: 32,768 tokens
+- **Format**: GGUF (optimized for Ollama)
+- **Base Model**: Qwen/Qwen2.5-Coder-7B-Instruct
+## Performance Notes
+- **Memory Usage**: ~15GB for the model file
+- **Recommended RAM**: 24GB+ for optimal performance
+- **GPU Support**: Metal acceleration on macOS, CUDA on Linux/Windows
+- **CPU Fallback**: Available but slower performance
+## License
+This model is based on Dante-7B, a language model derived from Qwen2 architecture. Please refer to the original model's license terms (Apache 2.0).
+## Acknowledgments
+- **Outflank**: Original model training and research
+- **Qwen Team**: Base model architecture
+- **llama.cpp**: GGUF conversion tools
+- **Ollama**: Deployment platform
+## Support
+For issues related to:
+- **Model conversion**: Check llama.cpp documentation
+- **Ollama deployment**: Check Ollama documentation
+- **Original model**: Contact Outflank team
+## Example Usage
+```bash
+# Basic conversation
+ollama run dante-7b "Hello, can you help me with shellcode generation?"
+# With system prompt
+ollama run dante-7b "You are a cybersecurity expert. Explain the concept of shellcode loaders."
+# Batch processing
+ollama run dante-7b -f input.txt -o output.txt
+```
+## Technical Details
+The conversion process preserves:
+- All original model weights and architecture
+- Tokenizer and vocabulary
+- Model metadata and configuration
+- Qwen2-specific formatting and tokens
+The Modelfile optimizes:
+- Temperature: 0.7 (balanced creativity)
+- Top-p: 0.9 (nucleus sampling)
+- Top-k: 40 (diversity control)
+- Repeat penalty: 1.1 (repetition control)
+- Context management: 32K tokens