ykarout
/

phi-4-deepseek-r1-distilled-latest-gguf

text-generation-inference

Model card Files Files and versions

phi-4-deepseek-r1-distilled-latest-gguf / README.md

ykarout's picture

Update README.md

ffa4c74 verified 16 days ago

|

history blame contribute delete

2.89 kB

	---
	license: mit
	datasets:
	- nvidia/Llama-Nemotron-Post-Training-Dataset
	language:
	- en
	- es
	- ar
	- fr
	base_model:
	- ykarout/phi4-deepseek-r1-distilled-v8-GGUF
	- microsoft/phi-4
	library_name: transformers
	tags:
	- deepseek
	- r1
	- reasoning
	- phi-4
	- math
	- code
	- chemistry
	- science
	- biology
	- art
	- unsloth
	- finance
	- legal
	- medical
	- text-generation-inference
	---
	# Phi-4 DeepSeek Distilled v8 GGUF

	This repository contains GGUF quantized versions of the Phi-4 DeepSeek R1 Distilled model. These GGUF files are optimized for local inference using frameworks like [llama.cpp](https://github.com/ggerganov/llama.cpp) and [Ollama](https://ollama.ai/) and LM Studio.

	## Model Information

	- Base Model: Phi-4 DeepSeek R1 Distilled
	- Parameters: 14.7B
	- Architecture: Phi3
	- Context Length: 16384 tokens
	- Training Data: Improved version of Phi-4, distilled with DeepSeek R1 Reasoning
	- License: MIT

	## Available Quantizations

	\| File \| Quantization \| Size \| Use Case \|
	\|------\|-------------\|------\|----------\|
	Q8_0
	Q6_K
	Q5_K_M
	Q4_K_M

	## Chat Template

	This model uses the ChatML format with the following structure:

	```
	<\|im_start\|>system<\|im_sep\|>System message here<\|im_end\|>
	<\|im_start\|>user<\|im_sep\|>User message here<\|im_end\|>
	<\|im_start\|>assistant<\|im_sep\|>Assistant response here<\|im_end\|>
	```

	## Usage with Ollama

	Create a custom Modelfile (paste this into a file named `Modelfile`):

	----------------------------------------------------------------------------------
	FROM /replace/with/path/to/your/gguf-file.gguf

	PARAMETER temperature 0.15
	PARAMETER top_p 0.93
	PARAMETER top_k 50
	PARAMETER repeat_penalty 1.15

	TEMPLATE """{{ if .System }}<\|im_start\|>system<\|im_sep\|>{{ .System }}<\|im_end\|>{{ end }}{{ range .Messages }}{{ if eq .Role "user" }}<\|im_start\|>user<\|im_sep\|>{{ .Content }}<\|im_en>"""

	PARAMETER stop "<\|im_start\|>"

	PARAMETER stop "<\|im_end\|>"

	------------------------------------------------------------------------
	Then create and use your model:

	ollama create phi4-deepseek-r1 -f Modelfile

	ollama run phi4-deepseek-r1

	## Usage with LMStudio

	1. Use the model search option to look up the model from huggingface
	2. Download and Load the Model
	3. Set the chat parameters (top_p, top_k, repeat_penalty etc...)
	4. Chat with the model (LMStudio directly detects the chat template so there is no manual configuration here unlike Ollama)

	## Usage with llama.cpp

	```bash
	# Download the model from Hugging Face
	wget https://huggingface.co/ykarout/phi4-deepseek-r1-distilled-v8-GGUF/resolve/main/phi4-deepseek-r1-distilled-v8-q8_0.gguf

	# Run the model with llama.cpp
	./main -m phi4-deepseek-r1-distilled-v8-q8_0.gguf -n 1024 --color -i -ins --chatml
	```

	## Benchmarks & Performance Notes

	- Q8_0: Best quality, requires ~16GB VRAM for 4K context
	- Q3_K_M: Good quality with 60% size reduction, suitable for systems with 8GB+ VRAM