anthonymikinka commited on
Commit
515f6e4
·
verified ·
1 Parent(s): f726cc5

Upload model-readme.md

Browse files
Files changed (1) hide show
  1. model-readme.md +93 -0
model-readme.md ADDED
@@ -0,0 +1,93 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # DeepSeek-R1-Distill-Llama-8B-Stateful-CoreML
2
+
3
+ This repository contains a CoreML conversion of the DeepSeek-R1-Distill-Llama-8B model optimized for Apple Silicon devices. This conversion features stateful key-value caching for efficient text generation.
4
+
5
+ ## Model Description
6
+
7
+ [DeepSeek-R1-Distill-Llama-8B](https://huggingface.co/deepseek-ai/DeepSeek-R1-Distill-Llama-8B) is a distilled 8 billion parameter language model from the DeepSeek-AI team. The model is built on the Llama architecture and has been distilled to maintain performance while reducing the parameter count.
8
+
9
+ This CoreML conversion provides:
10
+ - Full compatibility with Apple Silicon devices (M1, M2, M3 series)
11
+ - Stateful inference with KV-caching for efficient text generation
12
+ - Optimized performance for on-device deployment
13
+
14
+ ## Technical Specifications
15
+
16
+ - **Base Model**: deepseek-ai/DeepSeek-R1-Distill-Llama-8B
17
+ - **Parameters**: 8 billion
18
+ - **Context Length**: Configurable (default: 64, expandable based on memory constraints)
19
+ - **Quantization**: FP16
20
+ - **File Format**: .mlpackage
21
+ - **Deployment Target**: macOS 15+
22
+ - **Architecture**: Stateful LLM with key-value caching
23
+ - **Input Features**: Flexible input size with dynamic shape handling
24
+
25
+ ## Key Features
26
+
27
+ - **Stateful Inference**: The model implements a custom SliceUpdateKeyValueCache to maintain conversation state between inference calls, significantly improving generation speed.
28
+ - **Dynamic Input Shapes**: Supports variable input lengths through RangeDim specification.
29
+ - **Optimized Memory Usage**: Efficiently manages the key-value cache to minimize memory footprint.
30
+
31
+ ## Implementation Details
32
+
33
+ This conversion utilizes:
34
+ - A custom KvCacheStateLlamaForCausalLM wrapper around the Hugging Face Transformers implementation
35
+ - CoreML's state management capabilities for maintaining KV caches between inference calls
36
+ - Proper buffer registration to ensure state persistence
37
+ - Dynamic tensor shapes to accommodate various input and context lengths
38
+
39
+ ## Usage
40
+
41
+ The model can be loaded and used with CoreML in your Swift or Python projects:
42
+
43
+ ```python
44
+ import coremltools as ct
45
+
46
+ # Load the model
47
+ model = ct.models.MLModel("DeepSeek-R1-Distill-Llama-8B.mlpackage")
48
+
49
+ # Prepare inputs for inference
50
+ # ...
51
+
52
+ # Run inference
53
+ output = model.predict({
54
+ "inputIds": input_ids,
55
+ "causalMask": causal_mask
56
+ })
57
+ ```
58
+
59
+ ## Conversion Process
60
+
61
+ The model was converted using CoreML Tools with the following steps:
62
+ 1. Loading the original model from Hugging Face
63
+ 2. Wrapping it with custom state management
64
+ 3. Tracing with PyTorch's JIT
65
+ 4. Converting to CoreML format with state specifications
66
+ 5. Saving in the .mlpackage format
67
+
68
+ ## Requirements
69
+
70
+ To use this model:
71
+ - Apple Silicon Mac (M1/M2/M3 series)
72
+ - macOS 15 or later
73
+ - Minimum 16GB RAM recommended
74
+
75
+ ## Limitations
76
+
77
+ - The model requires significant memory for inference, especially with longer contexts
78
+ - Performance is highly dependent on the device's Neural Engine capabilities
79
+ - The default configuration supports a context length of 64 tokens, but this can be adjusted
80
+
81
+ ## License
82
+
83
+ This model conversion inherits the license of the original DeepSeek-R1-Distill-Llama-8B model.
84
+
85
+ ## Acknowledgments
86
+
87
+ - [DeepSeek-AI](https://github.com/deepseek-ai) for creating and releasing the original model
88
+ - [Hugging Face](https://huggingface.co/) for hosting the model and providing the Transformers library
89
+ - Apple for developing the CoreML framework
90
+
91
+ ## Citation
92
+
93
+ If you use this model in your research, please cite both the original DeepSeek model and this conversion.