anthonymikinka commited on
Commit
2dec9fd
·
verified ·
1 Parent(s): 515f6e4

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +103 -3
README.md CHANGED
@@ -1,3 +1,103 @@
1
- ---
2
- license: mit
3
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: mit
3
+ library_name: transformers
4
+ base_model:
5
+ - deepseek-ai/DeepSeek-R1-Distill-Llama-8B
6
+ pipeline_tag: text-generation
7
+ tags:
8
+ - llama
9
+ - conversational
10
+ ---
11
+ # DeepSeek-R1-Distill-Llama-8B-Stateful-CoreML
12
+
13
+ This repository contains a CoreML conversion of the DeepSeek-R1-Distill-Llama-8B model optimized for Apple Silicon devices. This conversion features stateful key-value caching for efficient text generation.
14
+
15
+ ## Model Description
16
+
17
+ [DeepSeek-R1-Distill-Llama-8B](https://huggingface.co/deepseek-ai/DeepSeek-R1-Distill-Llama-8B) is a distilled 8 billion parameter language model from the DeepSeek-AI team. The model is built on the Llama architecture and has been distilled to maintain performance while reducing the parameter count.
18
+
19
+ This CoreML conversion provides:
20
+ - Full compatibility with Apple Silicon devices (M1, M2, M3 series)
21
+ - Stateful inference with KV-caching for efficient text generation
22
+ - Optimized performance for on-device deployment
23
+
24
+ ## Technical Specifications
25
+
26
+ - **Base Model**: deepseek-ai/DeepSeek-R1-Distill-Llama-8B
27
+ - **Parameters**: 8 billion
28
+ - **Context Length**: Configurable (default: 64, expandable based on memory constraints)
29
+ - **Quantization**: FP16
30
+ - **File Format**: .mlpackage
31
+ - **Deployment Target**: macOS 15+
32
+ - **Architecture**: Stateful LLM with key-value caching
33
+ - **Input Features**: Flexible input size with dynamic shape handling
34
+
35
+ ## Key Features
36
+
37
+ - **Stateful Inference**: The model implements a custom SliceUpdateKeyValueCache to maintain conversation state between inference calls, significantly improving generation speed.
38
+ - **Dynamic Input Shapes**: Supports variable input lengths through RangeDim specification.
39
+ - **Optimized Memory Usage**: Efficiently manages the key-value cache to minimize memory footprint.
40
+
41
+ ## Implementation Details
42
+
43
+ This conversion utilizes:
44
+ - A custom KvCacheStateLlamaForCausalLM wrapper around the Hugging Face Transformers implementation
45
+ - CoreML's state management capabilities for maintaining KV caches between inference calls
46
+ - Proper buffer registration to ensure state persistence
47
+ - Dynamic tensor shapes to accommodate various input and context lengths
48
+
49
+ ## Usage
50
+
51
+ The model can be loaded and used with CoreML in your Swift or Python projects:
52
+
53
+ ```python
54
+ import coremltools as ct
55
+
56
+ # Load the model
57
+ model = ct.models.MLModel("DeepSeek-R1-Distill-Llama-8B.mlpackage")
58
+
59
+ # Prepare inputs for inference
60
+ # ...
61
+
62
+ # Run inference
63
+ output = model.predict({
64
+ "inputIds": input_ids,
65
+ "causalMask": causal_mask
66
+ })
67
+ ```
68
+
69
+ ## Conversion Process
70
+
71
+ The model was converted using CoreML Tools with the following steps:
72
+ 1. Loading the original model from Hugging Face
73
+ 2. Wrapping it with custom state management
74
+ 3. Tracing with PyTorch's JIT
75
+ 4. Converting to CoreML format with state specifications
76
+ 5. Saving in the .mlpackage format
77
+
78
+ ## Requirements
79
+
80
+ To use this model:
81
+ - Apple Silicon Mac (M1/M2/M3 series)
82
+ - macOS 15 or later
83
+ - Minimum 16GB RAM recommended
84
+
85
+ ## Limitations
86
+
87
+ - The model requires significant memory for inference, especially with longer contexts
88
+ - Performance is highly dependent on the device's Neural Engine capabilities
89
+ - The default configuration supports a context length of 64 tokens, but this can be adjusted
90
+
91
+ ## License
92
+
93
+ This model conversion inherits the license of the original DeepSeek-R1-Distill-Llama-8B model.
94
+
95
+ ## Acknowledgments
96
+
97
+ - [DeepSeek-AI](https://github.com/deepseek-ai) for creating and releasing the original model
98
+ - [Hugging Face](https://huggingface.co/) for hosting the model and providing the Transformers library
99
+ - Apple for developing the CoreML framework
100
+
101
+ ## Citation
102
+
103
+ If you use this model in your research, please cite both the original DeepSeek model and this conversion.