ganchito commited on
Commit
5dee7d4
·
verified ·
1 Parent(s): 861e8cb

Upload 2 files

Browse files
Files changed (2) hide show
  1. Modelfile +30 -0
  2. README.md +145 -3
Modelfile ADDED
@@ -0,0 +1,30 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ FROM Dante-7B.gguf
2
+
3
+ # Model metadata
4
+ PARAMETER stop "<|im_end|>"
5
+ PARAMETER stop "<|endoftext|>"
6
+ PARAMETER stop "<|im_start|>"
7
+ PARAMETER stop "<|endoftext|>"
8
+
9
+ # System prompt for the model
10
+ SYSTEM """You are Dante, a 7B parameter language model based on Qwen2 architecture. You are a helpful, creative, and intelligent AI assistant. You can engage in conversations, answer questions, help with tasks, and provide thoughtful responses. Always be respectful, honest, and helpful while maintaining a conversational and engaging tone."""
11
+
12
+ # Template for chat interactions (simplified for Ollama compatibility)
13
+ TEMPLATE """{{ if .System }}<|im_start|>system
14
+ {{ .System }}<|im_end|>
15
+ {{ end }}{{ if .Prompt }}<|im_start|>user
16
+ {{ .Prompt }}<|im_end|>
17
+ {{ end }}<|im_start|>assistant
18
+ {{ .Response }}<|im_end|>"""
19
+
20
+ # Model parameters optimized for Qwen2 architecture
21
+ PARAMETER temperature 0.7
22
+ PARAMETER top_p 0.9
23
+ PARAMETER top_k 40
24
+ PARAMETER repeat_penalty 1.1
25
+ PARAMETER num_ctx 32768
26
+ PARAMETER num_gpu 1
27
+ PARAMETER num_thread 8
28
+
29
+ # License and model information
30
+ LICENSE """This model is based on Dante-7B, a language model derived from Qwen2 architecture. Please refer to the original model's license terms."""
README.md CHANGED
@@ -1,3 +1,145 @@
1
- ---
2
- license: mit
3
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: apache-2.0
3
+ base_model:
4
+ - Qwen/Qwen2.5-Coder-7B-Instruct
5
+ ---
6
+
7
+ # Dante-7B GGUF for Ollama
8
+
9
+ This repository contains the Dante-7B model converted to GGUF format for use with Ollama, along with an optimized Modelfile for easy deployment.
10
+
11
+ ## About Dante-7B
12
+
13
+ Dante-7B is a 7 billion parameter model trained by [Outflank](https://www.outflank.nl/) to generate Windows shellcode loaders. The original model is based on Qwen2.5-Coder-7B-Instruct architecture.
14
+
15
+ - Original Blog: https://outflank.nl/blog/2025/08/07/training-specialist-models
16
+ - Original Demo: https://huggingface.co/spaces/outflanknl/Dante-7B-Demo
17
+ - Original Repository: https://huggingface.co/outflanknl/Dante-7B
18
+
19
+ ## Conversion Process
20
+
21
+ This GGUF version was created following these steps:
22
+
23
+ ### 1. Model Download
24
+ ```bash
25
+ # Clone the original repository
26
+ git clone https://huggingface.co/outflanknl/Dante-7B
27
+ cd Dante-7B
28
+
29
+ # Install Git LFS if not already installed
30
+ brew install git-lfs
31
+ git lfs install
32
+
33
+ # Pull the large model files
34
+ git lfs pull
35
+ ```
36
+
37
+ ### 2. Dependencies Installation
38
+ ```bash
39
+ # Install llama.cpp dependencies
40
+ cd ~/Downloads/llama.cpp
41
+ pip3 install torch torchvision torchaudio
42
+ pip3 install mistral-common gguf
43
+
44
+ # Install system dependencies
45
+ brew install sentencepiece
46
+ ```
47
+
48
+ ### 3. GGUF Conversion
49
+ ```bash
50
+ # Convert from Hugging Face format to GGUF
51
+ python3 convert_hf_to_gguf.py ~/Downloads/Dante-7B --outfile ~/Downloads/Dante-7B.gguf
52
+ ```
53
+
54
+ ### 4. Ollama Modelfile Creation
55
+ A custom Modelfile was created with:
56
+ - Optimized parameters for Qwen2 architecture
57
+ - 32K context window support
58
+ - Proper stop tokens for the model
59
+ - Simplified chat template for Ollama compatibility
60
+
61
+ ## Files Included
62
+
63
+ - **Dante-7B.gguf**: The converted model file (~15GB)
64
+ - **Modelfile**: Optimized configuration for Ollama deployment
65
+
66
+ ## Usage with Ollama
67
+
68
+ ### 1. Create the Model
69
+ ```bash
70
+ ollama create dante-7b -f Modelfile
71
+ ```
72
+
73
+ ### 2. Run the Model
74
+ ```bash
75
+ ollama run dante-7b
76
+ ```
77
+
78
+ ### 3. Environment Variables (Optional)
79
+ You can set these environment variables for optimal performance:
80
+ ```bash
81
+ export OLLAMA_CONTEXT_LENGTH="32768"
82
+ export OLLAMA_NUM_GPU="1"
83
+ export OLLAMA_NUM_THREAD="8"
84
+ ```
85
+
86
+ ## Model Specifications
87
+
88
+ - **Architecture**: Qwen2.5-Coder-7B-Instruct
89
+ - **Parameters**: 7 billion
90
+ - **Context Length**: 32,768 tokens
91
+ - **Format**: GGUF (optimized for Ollama)
92
+ - **Base Model**: Qwen/Qwen2.5-Coder-7B-Instruct
93
+
94
+ ## Performance Notes
95
+
96
+ - **Memory Usage**: ~15GB for the model file
97
+ - **Recommended RAM**: 24GB+ for optimal performance
98
+ - **GPU Support**: Metal acceleration on macOS, CUDA on Linux/Windows
99
+ - **CPU Fallback**: Available but slower performance
100
+
101
+ ## License
102
+
103
+ This model is based on Dante-7B, a language model derived from Qwen2 architecture. Please refer to the original model's license terms (Apache 2.0).
104
+
105
+ ## Acknowledgments
106
+
107
+ - **Outflank**: Original model training and research
108
+ - **Qwen Team**: Base model architecture
109
+ - **llama.cpp**: GGUF conversion tools
110
+ - **Ollama**: Deployment platform
111
+
112
+ ## Support
113
+
114
+ For issues related to:
115
+ - **Model conversion**: Check llama.cpp documentation
116
+ - **Ollama deployment**: Check Ollama documentation
117
+ - **Original model**: Contact Outflank team
118
+
119
+ ## Example Usage
120
+
121
+ ```bash
122
+ # Basic conversation
123
+ ollama run dante-7b "Hello, can you help me with shellcode generation?"
124
+
125
+ # With system prompt
126
+ ollama run dante-7b "You are a cybersecurity expert. Explain the concept of shellcode loaders."
127
+
128
+ # Batch processing
129
+ ollama run dante-7b -f input.txt -o output.txt
130
+ ```
131
+
132
+ ## Technical Details
133
+
134
+ The conversion process preserves:
135
+ - All original model weights and architecture
136
+ - Tokenizer and vocabulary
137
+ - Model metadata and configuration
138
+ - Qwen2-specific formatting and tokens
139
+
140
+ The Modelfile optimizes:
141
+ - Temperature: 0.7 (balanced creativity)
142
+ - Top-p: 0.9 (nucleus sampling)
143
+ - Top-k: 40 (diversity control)
144
+ - Repeat penalty: 1.1 (repetition control)
145
+ - Context management: 32K tokens